In this project, I want to share what is the most frequent reason that employees has stated which had much higher chance of getting excessive absence. To solve that problem I use data analytics to describe the pattern and predict which employee will absent using regression method (supervised learning).
For full report of this project, please visit: Absenteeism
Table of Contents:
1. Problem / Object Statement
2. Data Extraction
3. Data understanding
4. Installation
5. Exploratory Data Analysis
6. Data Preprocessing
7. Final CheckPoint
8. Building Model
9. Testing Model
10. Conclusion
From a business perspective, employees who are not present to do their jobs, will cost more than they should. The absence is a big problem because it reduces output and is annoying because it requires rescheduling and changing programs which is one of the contributing factors to the failure of a department’s organization to meet performance targets.
Based on these problems, this analysis is carried out to predict the predominant reason for employees absenteeism from work. To get answers to these problems, an analysis is carried out using supervised machine learning: Logistic regression.
• ID : Individual identification
• Reason of absence : Reasons 1-21 are registered in the International Classification
• Date : Date of Absence
• Transportation Expense : Cost related to business travel (fuel, parking, meals, etc)
• Distance to work : Distance measured in km
• Age : Years of age
• Daily workload average : Measured in minutes
• Body Mass Index : Number based on your weight and height
• Education : Representing different levels of education
• Childern : Number children in the family
• Pets : Number of pets in family
• Absenteeism time in hours : Target
To install the libraries used in this project. Follow the below steps
pip install pandas
pip install sklearn
pip install numpy
pip install seaborn
pip install matplotlib
At this stage, a brief analysis of the data will be carried out,
1. Data Distribution - Histogram
2. Boxplot
3. Data Correlation ![Logo](https://github.com/L-VinayKumar/Predict-Absenteeism-from-work/blob/main/Predict-Absenteeism/Data-Corr.png?raw=true)
At this stage, data preparation and processing will be carried out before being used as a data model
1. Encode
2. Casting Data Type
3. Categorization
4. Extract Date feature
5. Final Checkpoint
At this stage, new data set will predict with selected model before,
We Conclude our results: