Camoreno
Camoreno

Reputation: 7

How do I group values in different rows that have the same name in Pandas?

I have this Pandas DataFrame df:

    column1   column2
0   x           a
1   x           b
2   x           c
3   y           d
4   y           e
5   y           f
6   y           g
7   z           h
8   z           i
9   z           j

How do I group the values in column2 according to the value in column1?

Expected output:

    x   y   z
0   a   d   h
1   b   e   i
2   c   f   j
3       g

I'm new to Pandas, I'd really appreciate your help.

Upvotes: 1

Views: 63

Answers (2)

Aaj Kaal
Aaj Kaal

Reputation: 1284

Somewhat coding is due to the fact that each column in the solution (res) dataframe is of different size.

Code:

import pandas as pd
import numpy as np

df = pd.DataFrame(data = {'column1' : ['x', 'x', 'x', 'y', 'y', 'y', 'y', 'z', 'z', 'z'],  'column2' : ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']})
print(df)

new_columns = df['column1'].unique().tolist()  # ['x', 'y', 'z']
res = pd.DataFrame(columns=new_columns)
res[new_columns[0]] = df[df['column1'] == new_columns[0]]['column2'] # adding first column 'x'
for new_column in new_columns[1:]:
    new_col_ser = df[df['column1'] == new_column]['column2']
    no_of_rows_to_add = len(new_col_ser) - len(res)
    for i in range(no_of_rows_to_add):
        res.loc[len(res)+1,:] = np.nan
    res[new_column][:len(new_col_ser)] = new_col_ser

print(res)

Output:

  column1 column2
0       x       a
1       x       b
2       x       c
3       y       d
4       y       e
5       y       f
6       y       g
7       z       h
8       z       i
9       z       j
     x  y    z
0    a  d    h
1    b  e    i
2    c  f    j
4  NaN  g  NaN

Upvotes: 0

cs95
cs95

Reputation: 402493

This is a pivot problem with some preprocessing work:

(df.assign(index=df.groupby('column1').transform('cumcount')) 
   .pivot('index', 'column1', 'column2'))

column1    x  y    z
index               
0          a  d    h
1          b  e    i
2          c  f    j
3        NaN  g  NaN

We're pivoting using "column1" as the header and "column2" as the values. To make pivoting possible, we need a 3rd column which identifies the uniqueness of the values being pivoted, so we build that with groupby and cumcount.

Upvotes: 2

Related Questions