Reputation: 135
I have this data in a dataframe, The code column has several values and is of object datatype.
I want to split the rows in the following way
I tried to change the datatype by using
df['Code'] = df['Code'].astype(str)
and then tried to split the commas and reset the index on the basis of ID (unique) but I only get two column values. I need the entire dataset.
df = (pd.DataFrame(df.Code.str.split(',').tolist(), index=df.ID).stack()).reset_index([0, 'ID'])
df.columns = ['ID', 'Code']
Can someone help me out? I don't understand how to twist this code.
Attaching the setup code:
import pandas as pd
x = {'ID': ['1','2','3','4','5','6','7'],
'A': ['a','b','c','a','b','b','c'],
'B': ['z','x','y','x','y','z','x'],
'C': ['s','d','w','','s','s','s'],
'D': ['m','j','j','h','m','h','h'],
'Code': ['AB,BC,A','AD,KL','AD,KL','AB,BC','A','A','B']
}
df = pd.DataFrame(x, columns = ['ID', 'A','B','C','D','Code'])
df
Upvotes: 1
Views: 316
Reputation: 18466
You can first split Code
column on comma ,
then explode
it to get the desired output.
df['Code']=df['Code'].str.split(',')
df=df.explode('Code')
OUTPUT:
ID A B C D Code
0 1 a z s m AB
0 1 a z s m BC
0 1 a z s m A
1 2 b x d j AD
1 2 b x d j KL
2 3 c y w j AD
2 3 c y w j KL
3 4 a x h AB
3 4 a x h BC
4 5 b y s m A
5 6 b z s h A
6 7 c x s h B
If needed, you can replace empty string by NaN
Upvotes: 2