Setting calculated value for column for each group in a dataframe

Question

I have dataframe where I need to group by column x and change all the values of column a in every group to a calculated, but constant value for each group.

I start with a dataframe like this:

x     |   a  |   b 
------+------+-----   
a     |  -1  |  ...
b     |  -1  |  ...
c     |  -1  |  ...
a     |  -1  |  ...
b     |  -1  |  ...
c     |  -1  |  ...

and want to transform it to the dataframe below by grouping by column x and changing column a to the return of function f

p = ["k", "l"]

def f(group_number, list):    
    return list[group_number % len(list)]

x     |   a               |   b 
------+-------------------+-----   
a     |  f(ngroup(a), p)  |  ...
b     |  f(ngroup(b), p)  |  ...
c     |  f(ngroup(c), p)  |  ...
a     |  f(ngroup(a), p)  |  ...
b     |  f(ngroup(b), p)  |  ...
c     |  f(ngroup(c), p)  |  ...

ngroup is some function that does exactly what pandas.core.groupby.GroupBy.ngroup() does- it returns a number for every group.

The overall result should be

x     |  a  |   b 
------+-----+-----   
a     |  k  |  ...
b     |  l  |  ...
c     |  k  |  ...
a     |  k  |  ...
b     |  l  |  ...
c     |  k  |  ...

where all entries with a have the same value (k), all with b have value l and all with c have value k, too.

How can I achieve this?

Serge Ballesta · Accepted Answer

What you want to do is

df['a'] = p[df.groupby('x').ngroup() % len(p)]  # TypeError here

Unfortunately, you cannot directly broadcast to a Python list so this will raise a

TypeError: list indices must be integers or slices, not Series

But numpy ndarrays allow it, so you can just do:

df['a'] = np.array(p)[df.groupby('x').ngroup() % len(p)]

Setting calculated value for column for each group in a dataframe

Answers (1)

Related Questions