Reputation: 69
I've a dataframe like this
Name Val
A 1;2;3;4;5
B 10;20;30;40;50
C 11;22;33;44;55
D a;b;c;d;e
E 0.0;0.1;0.2;0.3;0.4
I need to convert it into a df like below
A B C D E
1 10 11 a 0.0
2 20 22 b 0.1
3 30 33 c 0.2
4 40 44 d 0.3
5 50 55 e 0.4
I wrote the below code to get the required output.
df_x = pd.DataFrame([['A','1;2;3;4;5'],
['B','10;20;30;40;50'],
['C','11;22;33;44;55'],
['D','a;b;c;d;e'],
['E', '0.0;0.1;0.2;0.3;0.4']], columns=['NAME','VAL'])
print(df_x, '\n')
new_dict = dict()
for idx,row in df_x.iterrows():
new_dict[row['NAME']] = row['VAL'].split(';')
df_y = pd.DataFrame(new_dict)
print(df_y)
But if there are thousands of data in VAL column, then I suspect this is not a very efficient way to get the output. Is there any other way to make this more efficient? (like not using a separate dictionary and try something within the dataframe or anyother way)
Upvotes: 0
Views: 43
Reputation: 863611
Use DataFrame.set_index
with Series.str.split
and transpose by DataFrame.T
:
df = df_x.set_index('NAME')['VAL'].str.split(';', expand=True).rename_axis(None).T
print(df, '\n')
A B C D E
0 1 10 11 a 0.0
1 2 20 22 b 0.1
2 3 30 33 c 0.2
3 4 40 44 d 0.3
4 5 50 55 e 0.4
Upvotes: 5