Reputation: 2853
I have pandas dataframe that looks like the following.
The column props
contain lists and the elements in the list is varying in length. I know the maximum number of entries in the list is 5. I also know that the list is ordered, i.e. I know that the second item always belong to the column with a specific header say "Tense" or "number". Then how can I convert each of the entry in lists into separate columns?
id source type target props subtype
2 wyrzucić V wyrzucisz [FUT, 2, SG] NaN
6 śniadać V śniadać [NFIN] NaN
7 bankrutować V bankrutujący [PST, ACT, PL, MASC, HUM] PTCP
8 chwiać V będą chwiały [FUT, 3, PL] NaN
23 dobyć V dobyłaś [PST, 2, SG, FEM] NaN
I have tried solutions with usntack()
and also with tolist()
methods. But the solutions do not work for the specific case.
Upvotes: 2
Views: 1500
Reputation: 9081
You can try this UDF and see if it works -
def col_gen(x):
props = x['props']
for i in range(len(props)):
x['Item'+str(i+1)] = props[i]
return x
df = df.apply(lambda x: col_gen(x), axis=1)
This is taking every row, extracting the props
column and appending it to additional columns
Upvotes: 1
Reputation: 76917
apply
is usually slow. You can use
In [34]: df.join(pd.DataFrame(df.props.values.tolist()))
Out[34]:
id props 0 1 2 3 4
0 2 [FUT, 2, SG] FUT 2 SG None None
1 6 [NFIN] NFIN None None None None
2 7 [PST, ACT, PL, MASC, HUM] PST ACT PL MASC HUM
3 8 [FUT, 3, PL] FUT 3 PL None None
4 23 [PST, 2, SG, FEM] PST 2 SG FEM None
Details
In [33]: df
Out[33]:
id props
0 2 [FUT, 2, SG]
1 6 [NFIN]
2 7 [PST, ACT, PL, MASC, HUM]
3 8 [FUT, 3, PL]
4 23 [PST, 2, SG, FEM]
Upvotes: 6
Reputation: 38415
Consider this simplified dataframe
df = pd.DataFrame({'id': [2,6,7,8,23], 'props': [['FUT', 2, 'SG'], ['NFIN'], ['PST', 'ACT', 'PL', 'MASC', 'HUM'], ['FUT', 3, 'PL'],['PST', 2, 'SG', 'FEM']]})
You can split the list column using
df[[1,2,3,4,5]] = df.props.apply(pd.Series)
You get
id props 1 2 3 4 5
0 2 [FUT, 2, SG] FUT 2 SG NaN NaN
1 6 [NFIN] NFIN NaN NaN NaN NaN
2 7 [PST, ACT, PL, MASC, HUM] PST ACT PL MASC HUM
3 8 [FUT, 3, PL] FUT 3 PL NaN NaN
4 23 [PST, 2, SG, FEM] PST 2 SG FEM NaN
Note: You can specify more relevant column names, I just used 1,2,3,4,5
Upvotes: 1