Joep Hoeijmakers
Joep Hoeijmakers

Reputation: 35

Calculate proportions of rows in dataframe

I have a problem that you hopefully can help with.

I have a dataframe with multiple columns that looks something like this:

education   experience  ExpenseA    ExpenseB    ExpenseC
uni         yes         3           2            5
uni         no          7           6            8
middle      yes         2           0            8
high        no          12          5            8
uni         yes         3           7            5

The Expenses A, B and C should add up to 10 per row, but often they don't because the data was not gathered correctly. For the rows where this is not the case, I want to take proportions.

The formula for this should be (cell value) / ((sum [ExpenseA] til [ExpenseC])/10)

example row two: total = 21 --> cells should be (value / 2.1)

How can I itterate this over all the rows for these specific columns?

Upvotes: 2

Views: 219

Answers (1)

jezrael
jezrael

Reputation: 863166

I think you need divide sum of columns with exclude first 2 columns selected by DataFrame.iloc:

df.iloc[:, 2:] = df.iloc[:, 2:].div(df.iloc[:, 2:].sum(axis=1).div(10), axis=0)
print (df)
  education experience  ExpenseA  ExpenseB  ExpenseC
0       uni        yes  3.000000  2.000000  5.000000
1       uni         no  3.333333  2.857143  3.809524
2    middle        yes  2.000000  0.000000  8.000000
3      high         no  4.800000  2.000000  3.200000
4       uni        yes  2.000000  4.666667  3.333333

Or sum columns with Expense substrings by DataFrame.filter:

df1 = df.filter(like='Expense')

df[df1.columns] = df1.div(df1.sum(axis=1).div(10), axis=0)

Upvotes: 1

Related Questions