Predict-Absenteeism-from-work

To Predict the predominant reason for absenteeism from work

In this project, I want to share what is the most frequent reason that employees has stated which had much higher chance of getting excessive absence. To solve that problem I use data analytics to describe the pattern and predict which employee will absent using regression method (supervised learning).

App Screenshot For full report of this project, please visit: Absenteeism

Summary Process

Table of Contents:

Problem / Object Statement
Data Extraction
Data understanding
Installation
Exploratory Data Analysis
Data Preprocessing
Final CheckPoint
Building Model
Testing Model
Conclusion

Problem Statement:

From a business perspective, employees who are not present to do their jobs, will cost more than they should. The absence is a big problem because it reduces output and is annoying because it requires rescheduling and changing programs which is one of the contributing factors to the failure of a department’s organization to meet performance targets.

Objective:

Based on these problems, this analysis is carried out to predict the predominant reason for employees absenteeism from work. To get answers to these problems, an analysis is carried out using supervised machine learning: Logistic regression.

Data understanding

• ID : Individual identification

• Reason of absence : Reasons 1-21 are registered in the International Classification

• Date : Date of Absence

• Transportation Expense : Cost related to business travel (fuel, parking, meals, etc)

• Distance to work : Distance measured in km

• Age : Years of age

• Daily workload average : Measured in minutes

• Body Mass Index : Number based on your weight and height

• Education : Representing different levels of education

• Childern : Number children in the family

• Pets : Number of pets in family

• Absenteeism time in hours : Target

Installation

To install the libraries used in this project. Follow the below steps

 pip install pandas
 pip install sklearn
 pip install numpy
 pip install seaborn
 pip install matplotlib

Exploratory Data Analysis

At this stage, a brief analysis of the data will be carried out,

Data Distribution - Histogram
Boxplot
Data Correlation ![Logo](https://github.com/L-VinayKumar/Predict-Absenteeism-from-work/blob/main/Predict-Absenteeism/Data-Corr.png?raw=true)

Data Preprocessing

At this stage, data preparation and processing will be carried out before being used as a data model

Encode
Casting Data Type
Categorization
Extract Date feature
Final Checkpoint

Model Building

Preprocessing the new Dataset
Creating the Targets
Standardize the data
Logistic regression model is used for prediction

Prediction Results

At this stage, new data set will predict with selected model before,

Predict New Dataset:

Conclusion

We Conclude our results:

Logo