Effects of Prosper Loan Characteristics on Their Interest Rate and Loan Status

Investigation Overview

We explore Loans data from Prosper and look at the features that are associated with the Interest Rates and the Loan Status. We look at the Loan Amount, EstimatedLoss and the ProsperScore in this exploration and how the Interest Rate itself is associated with the Loan Status.

Dataset Overview

The dataset contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, prosper score and many others. We used Loans that had a LoanOriginationDate after July, 2009 due to unavailability of ProsperScores and Estimated Loss for Loans that originated before that date. We ended up exploring a dataset of 84853 Loans.

Link to Detailed Analysis
Link to Project Repository

In [3]:
# Categorical Datatype Conversions
loan_df['LoanStatus'] = loan_df['LoanStatus'].astype('category')

# Remove data without ProsperScore(s)
loan_df = loan_df[loan_df['ProsperScore'].notna()]

# Change ProsperScore to String.
loan_df['ProsperScore'] = loan_df['ProsperScore'].astype(int).astype(str)
# Ordered Categorical Data
ordinal_var_dict = {'ProsperScore': ['1','2','3 ','4','5','6','7','8','9','10','11']}

for var in ordinal_var_dict:
    ordered_var = pd.api.types.CategoricalDtype(ordered = True,
                                                categories = ordinal_var_dict[var])
    loan_df[var] = loan_df[var].astype(ordered_var)
    
# default color to be used for plots
default_color = sns.color_palette()[0]

Distribution of Interest Rates

The interest rates for the Loans range from 0.04 to 0.36. The Interest Rate for most of the Loans ranged from approximately 0.30 to 0.33.

In [4]:
binsize = 0.005
bins = np.arange(loan_df['BorrowerRate'].min(), loan_df['BorrowerRate'].max()+binsize, binsize)

plt.figure(figsize=[8, 5])
plt.hist(loan_df['BorrowerRate'],bins=bins)
plt.title('Distribution of Interest Rates')
plt.xlabel('Interest Rate')
plt.show();

Distribution of Loan Status

Most of the Loans recorded have a Current. The Completed and Chargedoff Loans follow in descending order. The Loans assigned with the remaining status recorded small numbers.

In [5]:
plt.figure(figsize=(15,8))
sns.countplot(data=loan_df, x='LoanStatus', color=default_color);
plt.xticks(rotation=80);
plt.title('Distribution of the Loan Status');
plt.xlabel('Loan Status');
plt.ylabel(' ');

Interest Rate vs. Loan Status

Most of the loans that were Chargedof had high interest rates followed by Defaulted Loans. The Completed Loans have a fair distribution across the quartiles. Most of the Current Loans have a smaller Interest Rate.The remaining have a fairly equal distribution of Interest Rates.

In [6]:
plt.figure(figsize=(20,10))
sns.violinplot(data=loan_df,y='BorrowerRate',x='LoanStatus',inner='quartile', color=default_color);

plt.xticks(rotation=80);
plt.xlabel('Loan Status')
plt.ylabel('Interest Rate')
plt.title('Interest Rate vs Loan Status')

plt.show();

Interest Rate vs. Loan Amount

Loans less than 10,000USD have Interest Rates ranging from the 0.05 to 0.35. The Interest Rates reduce as the Loan Amount increases. Loans of about 35,000USD, for example, have their Interest Rates ranging from approximately 0.10 to 0.20.

In [7]:
plt.figure(figsize=(20,10))
sns.scatterplot(data=loan_df, x='BorrowerRate', y="LoanOriginalAmount", alpha=1/10);

plt.xlabel('Interest Rate')
plt.ylabel('Loan Amount ($)')
plt.title('Loan Amount in US Dollars vs. Interest Rate');

plt.show();

Interest Rate vs. Estimated Loss

Generally, an increase in Estimated Loss correlated with Interest Rate. The Estimated Loss for most of them capped at 0.2 and at an Interest Rate of about 0.33.

In [8]:
plt.figure(figsize=(20,10))
sns.scatterplot(data=loan_df, x='BorrowerRate', y="EstimatedLoss",alpha=1/10);

plt.xlabel('Interest Rate')
plt.ylabel('Estimated Loss')
plt.title('Estimated Loss vs. Interest Rate');

plt.show();

Loan Status vs. Loan Amount

Most of the loans in the ChargedOff, Completed and Defaulted Status were smaller Loan Amounts. What we will later explore is the possible reason for these Loans to be Chargedoff or Defaulted even though they were smaller amounts.

In [9]:
plt.figure(figsize=(15,10))
sns.violinplot(data=loan_df, x='LoanStatus', y='LoanOriginalAmount',
               inner="quartile", color=default_color);
plt.xticks(rotation=45);

plt.xlabel('Loan Status')
plt.ylabel('Loan Amount')
plt.title('The Loan Amount at the various Loan Status');

plt.show();

Selected Loan Status vs. Interest Rate vs. Loan Amount

For Completed and Current Loans we observe that the high Loan Amount have a smaller Interest Rates. The remaining Loan Status appear to have higher Interest Rates as seen by the opaque nature (less transparency) of the distribution at the right even though the Loan Amounts are smaller. Most of the Loan Amounts are below 15,000 USD.

The higher Interest Rates could be the reason why some Loans turned out to be Chargedoff or Defaulted even though the Loan Amounts were small as observed early on.

In [10]:
g = sns.FacetGrid(data = loan_df, col = 'LoanStatus', height = 4, col_wrap=3,
                  col_order=['Completed', 'Current','Chargedoff', 'Defaulted'])
g.map(plt.scatter, 'BorrowerRate', 'LoanOriginalAmount', alpha=0.05);
g.set_titles('{col_name}');
g.set(xlabel='Interest Rate', ylabel='Loan Amount');

Loan Status vs. Interest Rate vs. ProsperScore

Defaulted and Chargedoff Loans have high Interest Rates and lower ProsperScores (high risk scores). As we move from high risk to low risk along for the Prosper Scores, we observe that the Interest Rates also decrease.

In [11]:
plt.figure(figsize=[20,10])
cat_means = loan_df.groupby(['ProsperScore', 'LoanStatus']).mean()['BorrowerRate']
cat_means = cat_means.reset_index(name = 'borrower_rate_avg')
cat_means = cat_means.pivot(index = 'LoanStatus', columns = 'ProsperScore',
                            values = 'borrower_rate_avg')
ax = sns.heatmap(cat_means, annot = True, fmt = '.3f',cmap = 'vlag_r', center= 0,
           cbar_kws = {'label' : 'Mean Interest Rate'})
ax.set_ylim(11,0);
ax.set_title('Interest Rates For Loan Status across Prosper Scores', fontsize=30);