Data Analysis

• Size: 400 Kb
• Shape: 613 Rows & 13 Columns with 1 column as Target
• Data Source: DPhi
• Platform: Jupyter & Google Colab

Data Visualization

• Matplotlib & Seaborn to visualize.
• SweetViz for reporting.

Default Prediction

• Get your loan application approval chances by providing few necessary informations.


The director of SZE bank identified that going through the loan applications to filter the people who can be granted loans or need to be rejected is a tedious and time-consuming process.

He wants to automate it and increase his bank’s efficiency. After talking around a bit, your name pops up as one of the few data scientists who can make this possible within a limited time. Will you help the director out?


Get Knowledge about Loan Dataset

  • Select the report to show the EDA through the whole Data set
  • Select JoinReport to show the EDA through Train Data set
alternative alternative
Few Insights.

Using Machine Learning Model : Logistic Regression

AI System tries to predict the people who can be granted loans or need to be rejected

  • We see that the most correlate variables are:
    • ApplicantIncome with LoanAmount
    • Credit_History with Loan_Status.
    • LoanAmount is also correlated with CoapplicantIncome

Feature Engineering and and using models LR, Decision Tree, Random Forest

After using Feature engineering with the introduction of some new features like

  • Total Income: As discussed during bivariate analysis we will combine the Applicant Income and Coapplicant Income. If the total income is high, chances of loan approval might also be high.
  • EMI :is the monthly amount to be paid by the applicant to repay the loan. Idea behind making this variable is that people who have high EMI’s might find it difficult to pay back the loan. We can calculate the EMI by taking the ratio of loan amount with respect to loan amount term.
  • Balance Income :This is the income left after the EMI has been paid. Idea behind creating this variable is that if this value is high, the chances are high that a person will repay the loan and hence increasing the chances of loan approval.

  • We can see that:
    • Credit_History is the most important feature followed by
    • Balance Income,
    • EMI,
    • Total Income = ApplicantInc + Co-AppIncome

To know more about the data analysis look at this project on Project on Github