Sam
Sam

Reputation: 131

Splitting a string in dataframe

I have a dataframe like this:

col1|col2
{"test":"23","test1":"12"}|1992
{"test":"24","test1":"19","test3":"24"}|1993
{"test":"27","test1":"20","test3":"21","test4":"40"}|1994

I want a data frame like this:

col1_a|col1_b|col2
test|23|1992
test1|12|1992
test|24|1993
test1|19|1993
.
.
.
.
.
.

How could I achieve this solution? Although the data is type dictionary,it is stored as string in dataframe

Upvotes: 2

Views: 69

Answers (2)

Equinox
Equinox

Reputation: 6748

Expand the dictionary values to columns then melt/pivot down the table.

df = pd.DataFrame([[{"test":"23","test1":"12"},1992],
[{"test":"24","test1":"19","test3":"24"},1993],
[{"test":"27","test1":"20","test3":"21","test4":"40"},1994]],columns=['c1','c2'])

pd.DataFrame(df['c1'].values.tolist(), index=df.c2) \
    .reset_index() \
    .melt(id_vars='c2',var_name='col1_a',value_name='col1_b') \
    .dropna()

Output:

    c2  col1_a  col1_b
0   1992    test    23
1   1993    test    24
2   1994    test    27
3   1992    test1   12
4   1993    test1   19
5   1994    test1   20
7   1993    test3   24
8   1994    test3   21
11  1994    test4   40

Upvotes: 1

Mayank Porwal
Mayank Porwal

Reputation: 34056

Consider below df for example:

In [2063]: df = pd.DataFrame({'col1':[{"test":"23","test1":"12"}, {"test":"24","test1":"19","test3":"24"}, {"test":"27","test1":"20","test3":"21","test4":"40"}], 'col2':[1992, 1993, 1994]})

In [2064]: df
Out[2064]: 
                                                col1  col2
0                      {'test': '23', 'test1': '12'}  1992
1       {'test': '24', 'test1': '19', 'test3': '24'}  1993
2  {'test': '27', 'test1': '20', 'test3': '21', '...  1994

You can use df.apply with df.explode():

In [2085]: df.col1 = df.col1.apply(lambda x: list(x.items()))

In [2086]: df = df.explode('col1')

In [2091]: df[['col1_a', 'col1_b']] = pd.DataFrame(df.col1.tolist(), index=df.index)

In [2093]: df = df[['col1_a', 'col1_b', 'col2']]

In [2094]: df
Out[2094]: 
  col1_a col1_b  col2
0   test     23  1992
0  test1     12  1992
1   test     24  1993
1  test1     19  1993
1  test3     24  1993
2   test     27  1994
2  test1     20  1994
2  test3     21  1994
2  test4     40  1994

Upvotes: 2

Related Questions