Reputation: 49
Thanks for the help in advance. I have a pandas data frame that looks like this:
index source timestamp value
1 car 1 ['98']
2 bike 2 ['98', 100']
3 car 3 ['65']
4 bike 4 ['100', '120']
5 plane 5 ['20' , '12', '30']
What I need, is to convert each value inside the 'value' Panda series..to a new column. So the output would be like this:
index source timestamp car bike1 bike2 plane1 plane2 plane3
1 car 1 98 Na Na Na Na Na
2 bike 2 Na 98 100 Na Na Na
3 car 3 65 Na Na Na Na Na
4 bike 4 Na 100 120 Na Na Na
5 plane 5 Na Na Na 20 12 30
For car, the size of the array will always be one, for bike 2 and for plane 3. And that translates to the number of new columns that I need in the new data frame. What is the best way to achieve this?
Upvotes: 2
Views: 152
Reputation: 863246
First convert values to lists:
import ast
df['value'] = df['value'].apply(ast.literal_eval)
Then create dictionaries for each row:
L = [{f'{i}{x+1}':y for x, y in enumerate(j)} for i, j in zip(df['source'], df['value'])]
print (L)
[{'car1': '98'},
{'bike1': '98', 'bike2': '100'},
{'car1': '65'},
{'bike1': '100', 'bike2': '120'},
{'plane1': '20', 'plane2': '12', 'plane3': '30'}]
Create DataFrame
and join to original df:
df = df.join(pd.DataFrame(L, index=df.index))
print (df)
index source timestamp value bike1 bike2 car1 plane1 plane2 plane3
0 1 car 1 [98] NaN NaN 98 NaN NaN NaN
1 2 bike 2 [98, 100] 98 100 NaN NaN NaN NaN
2 3 car 3 [65] NaN NaN 65 NaN NaN NaN
3 4 bike 4 [100, 120] 100 120 NaN NaN NaN NaN
4 5 plane 5 [20, 12, 30] NaN NaN NaN 20 12 30
Upvotes: 1