Reputation: 173
I am currently working on a project analyzing student loans. I am comparing student loans based on gender, but I seem to have run into a problem. When I add the sum of the loan column in of the dataset, I get a different number than the sum of both genders combined. Here is my code.
print(male, female)
286 70
f_sum = 0
m_sum = 0
for i in df['LoanAmount']:
for x in df['Gender']:
if x == 'Female':
f_sum += i
else:
m_sum += i
print('Total Sum of LoanAmount:', df['LoanAmount'].sum())
print('Sum of Both Genders:', f_sum + m_sum)
Total Sum of LoanAmount: 49280.0
Sum of Both Genders: 128872
Am I doing something wrong here? I realize that this may not be enough information, and if you have any questions I am happy to answer.
Upvotes: 0
Views: 53
Reputation: 387
You are looping through the dataframe twice it seems, once for each loanAmount and then for each gender. What you want to do is apply a filter to your dataframe instead. You can do this as follows:
female_sum = df[df['Gender']=='Female']['LoanAmount'].sum()
male_sum = df[df['Gender']=='Male']['LoanAmount'].sum()
Upvotes: 0
Reputation: 398
You can filter the data by gender and sum them up:
f_sum = df[df['Gender'] == 'Female']['LoanAmount'].sum()
m_sum = df[df['Gender'] == 'Male']['LoanAmount'].sum()
Upvotes: 0
Reputation: 31
To sum the loan amount for each gender, you should use:
male_loans = df.loc[df.Gender == 'Male', 'LoanAmount'].sum()
female_loans = df.loc[df.Gender == 'Female', 'LoanAmount'].sum()
Upvotes: 0
Reputation: 57075
What you need is to group by gender and then sum the loan amounts:
df.groupy('Gender').sum()['LoanAmount']
Upvotes: 2