shortorian
shortorian

Reputation: 1182

what's the difference between pandas DataFrame methods agg() and apply()?

There are a number of SO questions regarding agg and apply on pandas DataFrame.groupby() objects, but I don't understand the difference between DataFrame.agg() and DataFrame.apply(). From the docs and the snippet below, they look the same to me. If there are issues specifically related to row operations that don't apply to operations on columns, I'd like to know about them.

import pandas as pd

a = pd.Series([True, False, False])
b = pd.Series([False, False, False])
c = pd.Series([True, True, False])
d = pd.Series([1, 2, 3])

print(pd.DataFrame({'a': a, 'b': b, 'c': c, 'd': d}).agg(lambda x: print(len(x)), axis=1))
print()
print(pd.DataFrame({'a': a, 'b': b, 'c': c, 'd': d}).apply(lambda x: print(len(x)), axis=1))
4
4
4
0    None
1    None
2    None
dtype: object

4
4
4
0    None
1    None
2    None
dtype: object

Upvotes: 0

Views: 111

Answers (1)

scotscotmcc
scotscotmcc

Reputation: 3113

They both actually call the same frame_apply(...) function with generally the same things going into them. One difference is that df.agg() (which actually itself is identical to df.aggregate()) has a step where it calls a reconstruct_func() function that does a little bit of cleanup/handling of what func you pass in.

For example, you can do df['col1'].agg(sum) and get a result, but if you do df['col1'].apply(sum) you'll get an error. Both .agg() and .apply() will work with the string 'sum', though - so we see that .agg() can take a little more variety. (This is all with a column of integers)

Upvotes: 1

Related Questions