Reputation: 110502
If I'm not mistaken, it seems like with df.apply
, using:
df.apply(function, axis=1)
is used to pass each row to a the function
. And then doing something like:
df['col'].apply(..., axis=0)
is used to send a value to a function. However, I'm wondering if one would ever use axis=0
when using more than one column. If so, how could that be used?
Upvotes: 0
Views: 4102
Reputation: 59579
DataFrame.apply
passes a single Series (at a time) so you can "only" use it for operations on a single row or a single column at a time. Here I'll simply print what is passed when we apply
along each axis:
import pandas as pd
df = pd.DataFrame([['a', 1], ['b', 2]], index=['r1', 'r2'], columns=['c1', 'c2'])
# Applying along axis=0 passes each Column Series separately
df.apply(lambda x: print(x, '\n'), axis=0)
#r1 a
#r2 b
#Name: c1, dtype: object
#r1 1
#r2 2
#Name: c2, dtype: int64
# Applying along axis=1 passes each row as a Series
df.apply(lambda x: print(x, '\n'), axis=1)
#c1 a
#c2 1
#Name: r1, dtype: object
#c1 b
#c2 2
#Name: r2, dtype: object
Notice that in the axis=1
case we're still passing a Series. Now the Series is indexed by what used to be the columns, and the name
is the row label. Also be careful, the dtype
was upcast to object for both since that was the only container capable of holding both the integers and strings.
I used quotes above for "only" because with enough imagination you can use apply
to deal with mutliple columns.There are better ways to do this, but this just shows it is possible. Here I'll use an apply to multiply all the 'val'
columns by the corresponding 'weight'
column. We do this by creating a custom function that also passes the entire DataFrame and then exploit the naming convention of the columns:
df = pd.DataFrame(np.random.normal(0,1, (15,4)),
columns=['val1', 'val2', 'weight1', 'weight2'])
def my_weight(s, df):
return s*df[s.name.replace('val', 'weight')]
df.filter(like='val').apply(lambda col: my_weight(col, df))
# val1 val2
#0 -0.175574 0.301880
#1 -0.032201 0.025987
#2 -2.063913 0.226745
#3 -0.617288 -0.220579
#4 0.912825 0.078496
Instead it would be much simpler to multiply directly:
df['val1']*df['weight1']
#0 -0.175574
#1 -0.032201
#2 -2.063913
#3 -0.617288
#4 0.912825
#dtype: float64
Upvotes: 3