Reputation: 2140
I have a dataframe with multiple columns and the content of one of the columns looks like a list:
df = pd.DataFrame({'Emojis':['[1 2 3 4]', '[4 5 6]']})
What I want to do to split the contents of these "lists" into the columns and since the sizes of the lists are not the same I will have the number of columns with the max of the items (5 items is the max) and whenever the items is less than that I will put null.
So the output will be something like this:
Emojis it1 it2 it3 it4 it5
0 [1 2 3 4] 1 2 3 4 null
1 [4 5 6] 4 5 6 null null
I was doing like this:
splitlist = df['Emojis'].apply(pd.Series)
df2 = pd.concat([df, splitlist], axis=1)
but its not close to what I want since the list is not really a list is saved in df as object without ,
Upvotes: 1
Views: 428
Reputation: 11
You can also use:
df = pd.DataFrame({'Emojis':['[1 2 3 4]', '[4 5 6]']})
for i in range(5):
column_name = 'it' + str(i)
df[column_name] = df['Emojis'].astype(str).str[1 + 2 * i]
Upvotes: 1
Reputation: 260790
You can use:
out = df.join(pd.DataFrame(df['Emojis'].str.findall('\d+').to_list(),
index=df.index)
.reindex(columns=range(5))
.rename(columns=lambda x: f'it{x+1}')
)
Output:
Emojis it1 it2 it3 it4 it5
0 [1 2 3 4] 1 2 3 4 NaN
1 [4 5 6] 4 5 6 None NaN
Upvotes: 2