Saturday, March 26, 2022

Decision tree example

Decision tree example on Loan dataset

Step#01

Import necessary libraries

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

%matplotlib inline

import numpy as np

Step#02

Read the dataset frm directory Path

df=pd.read_csv('/content/drive/MyDrive/dataset/loan_data.csv')

df.head() # shows first 5 rows from dataset

Step#03

I have choose Google Colab, so importing dataset from my drive directly

from google.colab import drive

drive.mount('/content/drive')

Step#04

first need to convert string values in variable'purpose' to some dummy variables. As, the purpose column is categorical.
That means we need to transform them using dummy variables so sklearn will be able to understand them. Let's do this in one clean step
using pd.get_dummies.
Let's show you a way of dealing with these columns that can be expanded to multiple categorical features if necessary.
1. Create a list of 1 element containing the string 'purpose'. Call this list cat_feats

cat_feats = ['purpose']

Step#05

2. Now use pd.get_dummies(loans,columns=cat_feats,drop_first=True) to create a fixed larger dataframe that has new feature columns with dummy variables. Set this dataframe as final_data.

final_data = pd.get_dummies(df, columns = cat_feats, drop_first = True)

Step#06

Compare both dataset now

df.head()

final_data.head()

Step#07

from sklearn.model_selection import train_test_split

X = final_data.drop('not.fully.paid', axis = 1) #drop not.fully.paid variable from x-axis

y = final_data['not.fully.paid']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20 , random_state= 101)

Step#08

Let's start by training a single decision tree.
first Import DecisionTreeClassifier
and then Create an instance of DecisionTreeClassifier() called dtree and fit it to the training data.

from sklearn.tree import DecisionTreeClassifier

dtree = DecisionTreeClassifier()

dtree.fit(X_train, y_train,sample_weight=None, check_input=True, X_idx_sorted=None)

Step#09

Create predictions from the test set and create a classification report.

y_predict = dtree.predict(X_test)

from sklearn.metrics import classification_report

print(classification_report(y_test, y_predict))

Dailymotion group

Saturday, March 26, 2022