Reputation: 1846
I want to add total fields to this DataFrame:
df_test = pd.DataFrame([
{'id':1,'cat1a':3,'cat1b':2, 'cat2a':4,'cat2b':3},
{'id':2,'cat1a':7,'cat1b':5, 'cat2a':9,'cat2b':6}
])
This code almost works:
def add_total(therecord):
t1 = therecord['cat1a'] + therecord['cat1b']
t2 = therecord['cat2a'] + therecord['cat2b']
return t1, t2
df_test['cat1tot', 'cat2tot'] = df_test[['cat1a', 'cat1b', 'cat2a', 'cat2b']].apply(add_total,axis=1)
Except it results in only 1 new column:
And this code:
def add_total(therecord):
t1 = therecord['cat1a'] + therecord['cat1b']
t2 = therecord['cat2a'] + therecord['cat2b']
return [t1, t2]
df_test[['cat1tot', 'cat2tot']] = df_test[['cat1a', 'cat1b', 'cat2a', 'cat2b']].apply(add_total,axis=1)
Results in: KeyError: "['cat1tot' 'cat2tot'] not in index"
I tried to resolve that with:
my_cols_list=['cat1tot','cat2tot']
df_test.reindex(columns=[*df_test.columns.tolist(), *my_cols_list], fill_value=0)
But that didn't solve the problem. So what am I missing?
Upvotes: 1
Views: 159
Reputation: 402263
Return a Series
object instead:
def add_total(therecord):
t1 = therecord['cat1a'] + therecord['cat1b']
t2 = therecord['cat2a'] + therecord['cat2b']
return pd.Series([t1, t2])
And then,
df_test[['cat1tot', 'cat2tot']] = \
df_test[['cat1a', 'cat1b', 'cat2a', 'cat2b']].apply(add_total,axis=1)
df_test
cat1a cat1b cat2a cat2b id cat1tot cat2tot
0 3 2 4 3 1 5 7
1 7 5 9 6 2 12 15
This works, because apply
will special case the Series
return type, and assume you want the result as a dataframe slice.
Upvotes: 2
Reputation: 164613
It's generally not a good idea to use df.apply
unless you absolutely must. The reason is that these operations are not vectorised, i.e. in the background there is a loop where each row is fed into a function as its own pd.Series
.
This would be a vectorised implementation:
df_test['cat1tot'] = df_test['cat1a'] + df_test['cat1b']
df_test['cat2tot'] = df_test['cat2a'] + df_test['cat2b']
# cat1a cat1b cat2a cat2b id cat1tot cat2tot
# 0 3 2 4 3 1 5 7
# 1 7 5 9 6 2 12 15
Upvotes: 2
Reputation: 2889
how about
df_test['cat1tot'], df_test['cat2tot'] =\
df_test[['cat1a', 'cat1b', 'cat2a', 'cat2b']].apply(add_total,axis=1)
Upvotes: 1