split a series in pandas pipe separated (with equal sign) into multiple columns

Question

I'm relatively new to Python and would like to seek your help in the following problem. My current series of information looks like this:


df['word_feats']
------------------------------------------------
0                                                   Case=Loc|Gender=Neut|Number=Sing|Person=3
1                                                   Case=Nom|Gender=Neut|Number=Sing|Person=3
2                                                                              PunctType=Comm
3                                                   Case=Nom|Gender=Neut|Number=Sing|Person=3
4                                                   Case=Nom|Gender=Neut|Number=Sing|Person=3
5                                                                                        None
6                                                   Case=Nom|Gender=Neut|Number=Sing|Person=3
7                                                   Case=Loc|Gender=Neut|Number=Sing|Person=3
8                                                            Gender=Neut|Number=Sing|Person=3
9                                                   Case=Loc|Gender=Neut|Number=Plur|Person=3
10                                                                                       None

My results in a single column are not in a dictionary format (which would be easier). Instead, it is in a pipe separated format with equals sign.

I am hoping to split this series into multiple columns, taking the 'key' (in this case, LEFT of equal sign) as the column name, and 'value' (RIGHT of equal sign) as my cell.

I've tried something along this line:

df['word_feats'].str.split('|', expand=True)

Which doesn't work as:

It doesn't take the 'key' as column name
My values are in the wrong columns!

Would appreciate any answers for this! Thanks.

Shubham Sharma · Accepted Answer

We can use a regular expression pattern to find all the occurrences of key-value pairs from each row, this will generate a list of tuples in every row, then map each list to dict and construct a dataframe from the mapped records

pd.DataFrame(map(dict, df['word_feats'].str.findall(r'([^|=]+)=([^|]+)')))

   Case Gender Number Person PunctType
0   Loc   Neut   Sing      3       NaN
1   Nom   Neut   Sing      3       NaN
2   NaN    NaN    NaN    NaN      Comm
3   Nom   Neut   Sing      3       NaN
4   Nom   Neut   Sing      3       NaN
5   NaN    NaN    NaN    NaN       NaN
6   Nom   Neut   Sing      3       NaN
7   Loc   Neut   Sing      3       NaN
8   NaN   Neut   Sing      3       NaN
9   Loc   Neut   Plur      3       NaN
10  NaN    NaN    NaN    NaN       NaN

split a series in pandas pipe separated (with equal sign) into multiple columns

Answers (2)

Related Questions