Example of df.apply with multiple columns and axis=0

Question

If I'm not mistaken, it seems like with df.apply, using:

df.apply(function, axis=1)

is used to pass each row to a the function. And then doing something like:

df['col'].apply(..., axis=0)

is used to send a value to a function. However, I'm wondering if one would ever use axis=0 when using more than one column. If so, how could that be used?

ALollz · Accepted Answer

DataFrame.apply passes a single Series (at a time) so you can "only" use it for operations on a single row or a single column at a time. Here I'll simply print what is passed when we apply along each axis:

import pandas as pd
df = pd.DataFrame([['a', 1], ['b', 2]], index=['r1', 'r2'], columns=['c1', 'c2'])

# Applying along axis=0 passes each Column Series separately
df.apply(lambda x: print(x, '
'), axis=0)
#r1    a
#r2    b
#Name: c1, dtype: object 

#r1    1
#r2    2
#Name: c2, dtype: int64 


# Applying along axis=1 passes each row as a Series
df.apply(lambda x: print(x, '
'), axis=1)
#c1    a
#c2    1
#Name: r1, dtype: object 

#c1    b
#c2    2
#Name: r2, dtype: object

Notice that in the axis=1 case we're still passing a Series. Now the Series is indexed by what used to be the columns, and the name is the row label. Also be careful, the dtype was upcast to object for both since that was the only container capable of holding both the integers and strings.

I used quotes above for "only" because with enough imagination you can use apply to deal with mutliple columns.There are better ways to do this, but this just shows it is possible. Here I'll use an apply to multiply all the 'val' columns by the corresponding 'weight' column. We do this by creating a custom function that also passes the entire DataFrame and then exploit the naming convention of the columns:

df = pd.DataFrame(np.random.normal(0,1, (15,4)),
                  columns=['val1', 'val2', 'weight1', 'weight2'])

def my_weight(s, df):
    return s*df[s.name.replace('val', 'weight')]

df.filter(like='val').apply(lambda col: my_weight(col, df))
#       val1      val2
#0 -0.175574  0.301880
#1 -0.032201  0.025987
#2 -2.063913  0.226745
#3 -0.617288 -0.220579
#4  0.912825  0.078496

Instead it would be much simpler to multiply directly:

df['val1']*df['weight1']
#0   -0.175574
#1   -0.032201
#2   -2.063913
#3   -0.617288
#4    0.912825
#dtype: float64

Example of df.apply with multiple columns and axis=0

Answers (1)

Related Questions