Amrith Krishna
Amrith Krishna

Reputation: 2853

Converting a dataframe column containing variable length lists to mutliple columns in dataframe

I have pandas dataframe that looks like the following.

The column props contain lists and the elements in the list is varying in length. I know the maximum number of entries in the list is 5. I also know that the list is ordered, i.e. I know that the second item always belong to the column with a specific header say "Tense" or "number". Then how can I convert each of the entry in lists into separate columns?

id  source   type   target          props                        subtype
2   wyrzucić    V   wyrzucisz       [FUT, 2, SG]                 NaN
6   śniadać     V   śniadać         [NFIN]                       NaN
7   bankrutować V   bankrutujący    [PST, ACT, PL, MASC, HUM]    PTCP
8   chwiać      V   będą chwiały    [FUT, 3, PL]                 NaN
23  dobyć       V   dobyłaś         [PST, 2, SG, FEM]            NaN

I have tried solutions with usntack() and also with tolist() methods. But the solutions do not work for the specific case.

Upvotes: 2

Views: 1500

Answers (3)

Vivek Kalyanarangan
Vivek Kalyanarangan

Reputation: 9081

You can try this UDF and see if it works -

def col_gen(x):
    props = x['props']
    for i in range(len(props)):
        x['Item'+str(i+1)] = props[i]
    return x

df = df.apply(lambda x: col_gen(x), axis=1)

This is taking every row, extracting the props column and appending it to additional columns

Upvotes: 1

Zero
Zero

Reputation: 76917

apply is usually slow. You can use

In [34]: df.join(pd.DataFrame(df.props.values.tolist()))
Out[34]:
   id                      props     0     1     2     3     4
0   2               [FUT, 2, SG]   FUT     2    SG  None  None
1   6                     [NFIN]  NFIN  None  None  None  None
2   7  [PST, ACT, PL, MASC, HUM]   PST   ACT    PL  MASC   HUM
3   8               [FUT, 3, PL]   FUT     3    PL  None  None
4  23          [PST, 2, SG, FEM]   PST     2    SG   FEM  None

Details

In [33]: df
Out[33]:
   id                      props
0   2               [FUT, 2, SG]
1   6                     [NFIN]
2   7  [PST, ACT, PL, MASC, HUM]
3   8               [FUT, 3, PL]
4  23          [PST, 2, SG, FEM]

Upvotes: 6

Vaishali
Vaishali

Reputation: 38415

Consider this simplified dataframe

df = pd.DataFrame({'id': [2,6,7,8,23], 'props': [['FUT', 2, 'SG'], ['NFIN'], ['PST', 'ACT', 'PL', 'MASC', 'HUM'], ['FUT', 3, 'PL'],['PST', 2, 'SG', 'FEM']]})

You can split the list column using

df[[1,2,3,4,5]] = df.props.apply(pd.Series)

You get

    id  props                       1       2   3   4       5
0   2   [FUT, 2, SG]                FUT     2   SG  NaN     NaN
1   6   [NFIN]                      NFIN    NaN NaN NaN     NaN
2   7   [PST, ACT, PL, MASC, HUM]   PST     ACT PL  MASC    HUM
3   8   [FUT, 3, PL]                FUT     3   PL  NaN     NaN
4   23  [PST, 2, SG, FEM]           PST     2   SG  FEM     NaN

Note: You can specify more relevant column names, I just used 1,2,3,4,5

Upvotes: 1

Related Questions