Reputation: 55
I have dataframe where I need to group by column x and change all the values of column a in every group to a calculated, but constant value for each group.
I start with a dataframe like this:
x | a | b
------+------+-----
a | -1 | ...
b | -1 | ...
c | -1 | ...
a | -1 | ...
b | -1 | ...
c | -1 | ...
and want to transform it to the dataframe below by grouping by column x and changing column a to the return of function f
p = ["k", "l"]
def f(group_number, list):
return list[group_number % len(list)]
x | a | b
------+-------------------+-----
a | f(ngroup(a), p) | ...
b | f(ngroup(b), p) | ...
c | f(ngroup(c), p) | ...
a | f(ngroup(a), p) | ...
b | f(ngroup(b), p) | ...
c | f(ngroup(c), p) | ...
ngroup is some function that does exactly what pandas.core.groupby.GroupBy.ngroup() does- it returns a number for every group.
The overall result should be
x | a | b
------+-----+-----
a | k | ...
b | l | ...
c | k | ...
a | k | ...
b | l | ...
c | k | ...
where all entries with a have the same value (k), all with b have value l and all with c have value k, too.
How can I achieve this?
Upvotes: 0
Views: 53
Reputation: 148890
What you want to do is
df['a'] = p[df.groupby('x').ngroup() % len(p)] # TypeError here
Unfortunately, you cannot directly broadcast to a Python list so this will raise a
TypeError: list indices must be integers or slices, not Series
But numpy ndarrays allow it, so you can just do:
df['a'] = np.array(p)[df.groupby('x').ngroup() % len(p)]
Upvotes: 1