CutePoison
CutePoison

Reputation: 5355

Operate on columns in pandas groupby

Assume I have a dataframe df which has 4 columns col = ["id","date","basket","gender"] and a function

def is_valid_date(df):
         idx = some_scalar_function(df["basket") #returns an index
         date = df["date"].values[idx]
         return (date>some_date)

I have always understood the groupby as a "creation of a new dataframe" when splitting in the "split-apply-combine" (losely speaking) thus if I want to apply is_valid_date to each group of id, I would assume I could do

df.groupby("id").agg(get_first_date)

but it throws KeyError: 'basket' in the idx=some_scalar_function(df["basket"])

Upvotes: 2

Views: 90

Answers (1)

jezrael
jezrael

Reputation: 862661

If use GroupBy.agg it working with each column separately, so cannot selecting like df["basket"], df["date"].

Solution is use GroupBy.apply with your custom function:

df.groupby("id").apply(get_first_date)

Upvotes: 2

Related Questions