XxEthan70xX
XxEthan70xX

Reputation: 173

Trouble iterating with two columns in a DataFrame

I am currently working on a project analyzing student loans. I am comparing student loans based on gender, but I seem to have run into a problem. When I add the sum of the loan column in of the dataset, I get a different number than the sum of both genders combined. Here is my code.

print(male, female)

286 70

f_sum = 0

m_sum = 0

for i in df['LoanAmount']:
  for x in df['Gender']:
    if x == 'Female':
        f_sum += i
    else:
        m_sum += i

print('Total Sum of LoanAmount:', df['LoanAmount'].sum())

print('Sum of Both Genders:', f_sum + m_sum)

Total Sum of LoanAmount: 49280.0

Sum of Both Genders: 128872

Am I doing something wrong here? I realize that this may not be enough information, and if you have any questions I am happy to answer.

Upvotes: 0

Views: 53

Answers (4)

Steven Stip
Steven Stip

Reputation: 387

You are looping through the dataframe twice it seems, once for each loanAmount and then for each gender. What you want to do is apply a filter to your dataframe instead. You can do this as follows:

female_sum = df[df['Gender']=='Female']['LoanAmount'].sum()
male_sum = df[df['Gender']=='Male']['LoanAmount'].sum()

Upvotes: 0

Taylor
Taylor

Reputation: 398

You can filter the data by gender and sum them up:

    f_sum = df[df['Gender'] == 'Female']['LoanAmount'].sum()
    m_sum = df[df['Gender'] == 'Male']['LoanAmount'].sum()

Upvotes: 0

Mike C
Mike C

Reputation: 31

To sum the loan amount for each gender, you should use:

male_loans = df.loc[df.Gender == 'Male', 'LoanAmount'].sum()
female_loans = df.loc[df.Gender == 'Female', 'LoanAmount'].sum()

Upvotes: 0

DYZ
DYZ

Reputation: 57075

What you need is to group by gender and then sum the loan amounts:

df.groupy('Gender').sum()['LoanAmount']

Upvotes: 2

Related Questions