Reputation: 25997
I have a dataframe like this (usually with far more columns and rows):
A B
0 5 10
1 15 3
Now I want to add columns A_ratio
and B_ratio
to this dataframe whereby the values there represent A/(A + B)
and B/(A + B)
, respectively. So A_ratio
and B_ratio
should add up to 1 in each row of the dataframe.
My first attempt looked like this:
import pandas as pd
df = pd.DataFrame({'A': [5,15], 'B': [10,3]})
for coli in df:
df[coli + '_ratio'] = df[coli]/df.sum(axis=1)
giving me the following result:
A B A_ratio B_ratio
0 5 10 0.333333 0.652174
1 15 3 0.833333 0.159292
Clearly, the columns A_ratio
and B_ratio
do not add up to 1. While the values in A_ratio
are correct they are wrong in B_ratio
since the row sum is changed when A_ratio
is added.
A workaround could be to copy the dataframe first:
df2 = pd.DataFrame({'A': [5,15], 'B': [10,3]})
df2cl = df2.copy()
for coli in df2:
df2[coli + '_ratio'] = df2[coli]/df2cl.sum(axis=1)
which gives me the desired output:
A B A_ratio B_ratio
0 5 10 0.333333 0.666667
1 15 3 0.833333 0.166667
Is there a more efficient way of doing this which avoids copying the dataframe?
Upvotes: 3
Views: 194
Reputation: 967
You don't need to call sum each time.
>>%timeit %run multiple_sum.py
100 loops, best of 3: 6.59 ms per loop
>>%timeit %run single_sum.py
100 loops, best of 3: 3.84 ms per loop
if you have a big dataframe this is going to be needless overhead.
sums = df.sum(axis=1)
for coli in df:
df[coli + '_ratio'] = df[coli]/sums
is sufficient
Upvotes: 2
Reputation: 393903
You can just sub-select from your df so that it only sums those 2 columns:
In [195]:
for coli in df:
df[coli + '_ratio'] = df[coli]/df[['A','B']].sum(axis=1)
df
Out[195]:
A B A_ratio B_ratio
0 5 10 0.333333 0.666667
1 15 3 0.833333 0.166667
You can just take a copy of the column names upfront if you don't want to hardcode them:
In [197]:
cols = df.columns
for coli in df:
df[coli + '_ratio'] = df[coli]/df[cols].sum(axis=1)
df
Out[197]:
A B A_ratio B_ratio
0 5 10 0.333333 0.666667
1 15 3 0.833333 0.166667
Upvotes: 2