CutePoison
CutePoison

Reputation: 5395

Apply a function which operates on specific columns in a dataframe

Say I have the following data frame (note, it is only for the purpose of illustration, not for the actual problem to solve)

#df = pd.DataFrame({"id":[1,1,1,2,2,2],
#"purchase":[True,False,False,False,True,True],
#"prod":["Apple","Pear","Banana"]*2})

    id  purchase prod
----+-----+--------+
    1   True    Apple
    1   False   Pear
    1   False   Banana
    2   False   Apple
    2   True    Pear
    2   True    Banana

and a function to return only the purchased products

def get_prod_purch(df):
    """
    Get products
    """
    x = df["purchase"]
    return df.loc[x]

If I run this as a groupby it works perfect:

df.groupby("id").apply(get_prod_purch)
#
        id  purchase prod
id  ----+-----+-------+         
1   0   1   True    Apple
2   4   2   True    Pear
    5   2   True    Banana

but if I just want to run it on the dataframe

df.apply(get_prod_purch)
#KeyError: 'purchase'

df.apply(get_prod_purch,axis=1)
#KeyError: True

Is there a way to run such a function on the dataframe and not the groupby i.e


df.apply(some_function)

#Result
    id  purchase prod
----+-----+--------+
    1   True    Apple
    2   True    Pear
    2   True    Banana

Upvotes: 1

Views: 37

Answers (1)

jezrael
jezrael

Reputation: 863791

Use DataFrame.pipe, because need apply function for all DataFrame:

print (df.pipe(get_prod_purch))
   id  purchase    prod
0   1      True   Apple
4   2      True    Pear
5   2      True  Banana

Or pass DataFrame to function like:

print (get_prod_purch(df))
   id  purchase    prod
0   1      True   Apple
4   2      True    Pear
5   2      True  Banana

If use DataFrame.apply function runs per columns or per rows axis=1.

Upvotes: 1

Related Questions