Reputation: 7611
I have a dataset in the form:
A B
0 30 60538815980
1 30 7410811099
2 26 2238403510
3 26 2006613744
4 26 2006618783
5 26 2006613743
I want to combine the rows where the value of A matches and produce something like that
C_1 C_2 C_3 C_4
A
26 2238403510 2006613744 2006618783 2006613743
30 60538815980 7410811099 NaN NaN
I have tried expressing it in terms of join or merge but have failed so far. Is there any simple way to express that or will I have to use apply and create a new DataFrame?
Upvotes: 2
Views: 3482
Reputation: 353499
I'd also recommend using groupby
but I think we can use pivot
to simplify things. First, we create a new C column with the column labels we want to use, and then we call pivot
:
>>> df["C"] = "C_" + (df.groupby("A").cumcount() + 1).astype(str)
>>> df.pivot(index="A", values="B", columns="C")
C C_1 C_2 C_3 C_4
A
26 2238403510 2006613744 2006618783 2006613743
30 60538815980 7410811099 NaN NaN
Upvotes: 3
Reputation: 109726
First, create a groupby
object based on column A
. Then create a new dataframe df2
which uses ix
to index column B
of each group based on the value n
from column A
. Set the index of this dataframe equal to the key values from the groupby
(i.e. the unique values from column A
).
Finally, use a list comprehension to set the new column values equal to C_1
, C_2
, ..., etc.
df = pd.DataFrame({'A': [30, 30, 26, 26, 26, 26],
'B': [60538815980, 7410811099, 2238403510,
2006613744, 2006618783, 2006613743]})
gb = df.groupby('A')
df2 = pd.DataFrame([df.ix[gb.groups[n], 'B'].values for n in gb.groups],
index=gb.groups.keys())
df2.columns = ["C_" + str(i + 1) for i in df2.columns]
df2.index.name = "A"
>>> df2
C_1 C_2 C_3 C_4
A
26 2238403510 2006613744 2006618783 2006613743
30 60538815980 7410811099 NaN NaN
Upvotes: 4