Reputation: 1182
There are a number of SO questions regarding agg
and apply
on pandas DataFrame.groupby()
objects, but I don't understand the difference between DataFrame.agg()
and DataFrame.apply()
. From the docs and the snippet below, they look the same to me. If there are issues specifically related to row operations that don't apply to operations on columns, I'd like to know about them.
import pandas as pd
a = pd.Series([True, False, False])
b = pd.Series([False, False, False])
c = pd.Series([True, True, False])
d = pd.Series([1, 2, 3])
print(pd.DataFrame({'a': a, 'b': b, 'c': c, 'd': d}).agg(lambda x: print(len(x)), axis=1))
print()
print(pd.DataFrame({'a': a, 'b': b, 'c': c, 'd': d}).apply(lambda x: print(len(x)), axis=1))
4
4
4
0 None
1 None
2 None
dtype: object
4
4
4
0 None
1 None
2 None
dtype: object
Upvotes: 0
Views: 111
Reputation: 3113
They both actually call the same frame_apply(...)
function with generally the same things going into them. One difference is that df.agg()
(which actually itself is identical to df.aggregate()
) has a step where it calls a reconstruct_func()
function that does a little bit of cleanup/handling of what func you pass in.
For example, you can do df['col1'].agg(sum)
and get a result, but if you do df['col1'].apply(sum)
you'll get an error. Both .agg()
and .apply()
will work with the string 'sum'
, though - so we see that .agg()
can take a little more variety. (This is all with a column of integers)
Upvotes: 1