Reputation: 7
I have this Pandas DataFrame df
:
column1 column2
0 x a
1 x b
2 x c
3 y d
4 y e
5 y f
6 y g
7 z h
8 z i
9 z j
How do I group the values in column2
according to the value in column1
?
Expected output:
x y z
0 a d h
1 b e i
2 c f j
3 g
I'm new to Pandas, I'd really appreciate your help.
Upvotes: 1
Views: 63
Reputation: 1284
Somewhat coding is due to the fact that each column in the solution (res) dataframe is of different size.
Code:
import pandas as pd
import numpy as np
df = pd.DataFrame(data = {'column1' : ['x', 'x', 'x', 'y', 'y', 'y', 'y', 'z', 'z', 'z'], 'column2' : ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']})
print(df)
new_columns = df['column1'].unique().tolist() # ['x', 'y', 'z']
res = pd.DataFrame(columns=new_columns)
res[new_columns[0]] = df[df['column1'] == new_columns[0]]['column2'] # adding first column 'x'
for new_column in new_columns[1:]:
new_col_ser = df[df['column1'] == new_column]['column2']
no_of_rows_to_add = len(new_col_ser) - len(res)
for i in range(no_of_rows_to_add):
res.loc[len(res)+1,:] = np.nan
res[new_column][:len(new_col_ser)] = new_col_ser
print(res)
Output:
column1 column2
0 x a
1 x b
2 x c
3 y d
4 y e
5 y f
6 y g
7 z h
8 z i
9 z j
x y z
0 a d h
1 b e i
2 c f j
4 NaN g NaN
Upvotes: 0
Reputation: 402493
This is a pivot problem with some preprocessing work:
(df.assign(index=df.groupby('column1').transform('cumcount'))
.pivot('index', 'column1', 'column2'))
column1 x y z
index
0 a d h
1 b e i
2 c f j
3 NaN g NaN
We're pivoting using "column1" as the header and "column2" as the values. To make pivoting possible, we need a 3rd column which identifies the uniqueness of the values being pivoted, so we build that with groupby
and cumcount
.
Upvotes: 2