Campus Placement Dataset Analysis

Dataset

First thing to notice here is that if the status is "Not Placed", it seems that the salary will be NaN. Just a remainder to not treat it as missing data. It will be necessary to check if there is someone "Placed" with a NaN salary. Some encoding will be necessary for the categorical data.

The first column seems to be just an ID. It will be droped.

Let's take a better look at the features information

Categorical features

Gender
Lower Secondary Education
Higher Secondary Education
Specialization
Degree Specialization
Work Experience
Placement Status

Exploratory Analysis

Descrption
We will fill missing salaries with 0s as they represent students who have not got placement offer.

Let's analyse the distribution of the PlacementStatus column

Converting rupees to US

$ (1 rupee = $ 0.013)

Gender based analysis

Swarmplot of Gender VS Salary
Lower Secondary Grade Based
Higher Secondary Grade Based
Undergraduation Degree Based
Specialization
Post Graduation Based
Specialisation

MBA Specialisation

Mean

Observation

Men are getting higher salaries than women even when women scored higher percentages during their school and college degrees. 1. The Range of salary is high for boys with the median of 2.5 Lakhs per annum 2. The Median salary for girls is 2.1 Lakhs per annum 3. The highest package is offered to a boy which is nearly to 10 Lakhs per annum 4. The highest package offered for girls is near to 7 Lakhs per annum Women that came from men-dominated fields receive lower salaries, but receive higher salaries if they came from "Others" fields. However, the amount of data from the "Others" field is not large enough to give a conclusive answer, more data would be required.

Placement status based analysis

Lower Secondary Education Based
Swarmplot of Secondary Education and Salary
Higher Secondary Education Based

Swarmplot of Higher Education and Salary

Undergraduate Degree Based

Swarmplot Of Undergraduate Education field and salary

Swarmplot of Work Experience and Salary

Post Graduation Based

Swarmplot Of Post graduation and salary

Employment Test Percentage

Observation

Students with higher percentages/better academic results were able to perform well during placements compared to those who had relatively lower academic results

Salary vs Academic results

Does the grade in school affect your future life?
Lower Secondary Eduacation vs Salary
Higher Secondary Education vs Salary
Undergraduate vs Salary
Post Graduation vs Salary

Which stream students are getting more placed and which stream students are mostly not placed?

Observation

There seems to be no relation between grades in previous education and salaries. Students with average percentage of 60-70 are able to get around 250000 INR anually. Higher percentage does not neccesarily corresponds to higher salary package. The Secondary board that the student came from seems to have no impact here, but the outliers came from the central board, which is interesting. In Higher Secondary education Science specialization leads to a better salary than the other options (specially Arts) The stream in which the students mostly get placed are Commucation and management , also science and technology students are mostly getting placed and other stream students are not getting that much placements due to less number of students.

Does Percentage in College determine salary ?

Relation between EmployabilityTest and PlacementStatus

Surprisingly, the employability test seems to not be a major factor to the placement rate. There are a lot of students with good scores that did not land a job. It may be necessary for the institution to rethink this test.

Salary Vs Employability Test

There is a positive relation between salary and the employability test. It is not huge, but it is visible.

Is the MBAGrade important?

The grade indeed is important to the salary

How about work experience?

Observation

Work Experience is really important! Having working previously leads to a way higher employment rate and salary.

Encoding

We will Label Encode (no new columns) categorical features that only have two unique values. e.g. Gender M/F and Hot Encoded every other categorical column (columns with more than 2 unique values)

Correlations

Heat Map for checking correlation

Here we are looking at PlacementStatus, so Salary is not important and we will ignore the NaN
Here we are looking at PlacementStatus, so Salary is not important and we will ignore the NaN

Observations

These results allow us to see what are the features that most relate with what we are trying to predict. In the case o PlacementStatus, having high grades during your education (specially basic education) has a huge factor on employability, alongside work experience and the MBA field of choice. In the case of salary, major factors are the specialization and most recent grades.

Questions

What are the major factors that lead to the person being hired

We already answered this question in the data analysis phase. Things like MBA area of specialization, experience and grades are the most important here.

We will try to predict if someone who just graduated from the MBA is employed.

First, the preprocessing: remove the salary and any other feature that does not seems to be related to the problem. Finally, standardize the data.

What are the major factors that affect the salary?

As we saw in the data analysis phase, the most important features in this problem are MBA and the Employability Test grades, the degree and MBA areas of specialization, gender and work experience. we will build a model that predicts the salary of someone who just graduated from the MBA (assuming this person got hired).
Correlation returns NaN if all values are that same, which is the case with PlacementStatus
Linear Regression
These results show that it is better to just use Linear Regression instead of polynomial regression

Observation

Results are not great, but the amount of data does not help very much

Can the institution predict if the person will be successfull before they are accepted?

Here, the idea is that the institution would like to know beforehand if someone would fail to find a job after the MBA. The objective is to avoid accepting people that would not find a job and increase the employability rate of the institution. For that, we will keep just the information that are previous to the MBA and see if it is enough to predict the employability of a candidate

The knowlodge from the data analysis phase can tell us what are the best features.

"Things like MBA area of specialization, experience and grades are the most important here."

Here we are looking at PlacementStatus, so Salary is not important and we will ignore the NaN