Reputation: 35
I have a problem that you hopefully can help with.
I have a dataframe with multiple columns that looks something like this:
education experience ExpenseA ExpenseB ExpenseC
uni yes 3 2 5
uni no 7 6 8
middle yes 2 0 8
high no 12 5 8
uni yes 3 7 5
The Expenses A, B and C should add up to 10 per row, but often they don't because the data was not gathered correctly. For the rows where this is not the case, I want to take proportions.
The formula for this should be (cell value) / ((sum [ExpenseA] til [ExpenseC])/10)
example row two: total = 21 --> cells should be (value / 2.1)
How can I itterate this over all the rows for these specific columns?
Upvotes: 2
Views: 219
Reputation: 863166
I think you need divide sum of columns with exclude first 2 columns selected by DataFrame.iloc
:
df.iloc[:, 2:] = df.iloc[:, 2:].div(df.iloc[:, 2:].sum(axis=1).div(10), axis=0)
print (df)
education experience ExpenseA ExpenseB ExpenseC
0 uni yes 3.000000 2.000000 5.000000
1 uni no 3.333333 2.857143 3.809524
2 middle yes 2.000000 0.000000 8.000000
3 high no 4.800000 2.000000 3.200000
4 uni yes 2.000000 4.666667 3.333333
Or sum columns with Expense
substrings by DataFrame.filter
:
df1 = df.filter(like='Expense')
df[df1.columns] = df1.div(df1.sum(axis=1).div(10), axis=0)
Upvotes: 1