Reputation: 15364
My dataset looks like this:
df = pd.DataFrame({"A": [1, 1, 1, 1, 2, 2, 2, 3, 3],
"B": ["a", "b", "c", "c", "b", "b", "d", "a", "c"],
"C": ["x", "x", "y", "x", "x", "y", "z", "y", "z"]})
>>> df
A B C
0 1 a x
1 1 b x
2 1 c y
3 1 c x
4 2 b x
5 2 b y
6 2 d z
7 3 a y
8 3 c z
I want to perform a groupby using the values of the A column. Specifically, this is the desired output:
A B C
0 1 a b c c [x, x, y, x]
1 2 b b d [x, y, z]
2 3 a c [y, z]
In other words, I want to join all the values of the B column using a single space, and I want to create a list with all the values of the C column.
So far I have been able to create the two desired columns in this way:
B = df.groupby("A")["B"].apply(lambda x: " ".join(x))
C = df.groupby("A")["C"].apply(list)
I am trying to modify both columns of my dataframe in place with a single groupby operation. Is it possible?
Upvotes: 1
Views: 83
Reputation: 862641
Use GroupBy.agg
, for prevent A
convert to index is used as_index=False
parameter, also lambda function is simplify:
df1 = df.groupby("A", as_index=False).agg({'B': " ".join, 'C':list})
print (df1)
A B C
0 1 a b c c [x, x, y, x]
1 2 b b d [x, y, z]
2 3 a c [y, z]
Upvotes: 3
Reputation: 150745
Yes, you can use groupby().agg
:
df.groupby('A').agg({'B': " ".join, 'C':list})
Upvotes: 2