suhabari
suhabari

Reputation: 135

Splitting a column into multiple rows

I have this data in a dataframe, The code column has several values and is of object datatype.

enter image description here

I want to split the rows in the following way

result enter image description here

I tried to change the datatype by using

df['Code'] = df['Code'].astype(str)

and then tried to split the commas and reset the index on the basis of ID (unique) but I only get two column values. I need the entire dataset.

df = (pd.DataFrame(df.Code.str.split(',').tolist(), index=df.ID).stack()).reset_index([0, 'ID'])
df.columns = ['ID', 'Code']

Can someone help me out? I don't understand how to twist this code.

Attaching the setup code:

import pandas as pd

x = {'ID': ['1','2','3','4','5','6','7'],
        'A': ['a','b','c','a','b','b','c'],
        'B': ['z','x','y','x','y','z','x'],
        'C': ['s','d','w','','s','s','s'],
        'D': ['m','j','j','h','m','h','h'],
        'Code': ['AB,BC,A','AD,KL','AD,KL','AB,BC','A','A','B']
        }

df = pd.DataFrame(x, columns = ['ID', 'A','B','C','D','Code'])

df

Upvotes: 1

Views: 316

Answers (1)

ThePyGuy
ThePyGuy

Reputation: 18466

You can first split Code column on comma , then explode it to get the desired output.

df['Code']=df['Code'].str.split(',')
df=df.explode('Code')

OUTPUT:

  ID  A  B  C  D Code
0  1  a  z  s  m   AB
0  1  a  z  s  m   BC
0  1  a  z  s  m    A
1  2  b  x  d  j   AD
1  2  b  x  d  j   KL
2  3  c  y  w  j   AD
2  3  c  y  w  j   KL
3  4  a  x     h   AB
3  4  a  x     h   BC
4  5  b  y  s  m    A
5  6  b  z  s  h    A
6  7  c  x  s  h    B

If needed, you can replace empty string by NaN

Upvotes: 2

Related Questions