Reputation: 131
I have a dataframe like this:
col1|col2
{"test":"23","test1":"12"}|1992
{"test":"24","test1":"19","test3":"24"}|1993
{"test":"27","test1":"20","test3":"21","test4":"40"}|1994
I want a data frame like this:
col1_a|col1_b|col2
test|23|1992
test1|12|1992
test|24|1993
test1|19|1993
.
.
.
.
.
.
How could I achieve this solution? Although the data is type dictionary,it is stored as string in dataframe
Upvotes: 2
Views: 69
Reputation: 6748
Expand the dictionary values to columns then melt/pivot down the table.
df = pd.DataFrame([[{"test":"23","test1":"12"},1992],
[{"test":"24","test1":"19","test3":"24"},1993],
[{"test":"27","test1":"20","test3":"21","test4":"40"},1994]],columns=['c1','c2'])
pd.DataFrame(df['c1'].values.tolist(), index=df.c2) \
.reset_index() \
.melt(id_vars='c2',var_name='col1_a',value_name='col1_b') \
.dropna()
Output:
c2 col1_a col1_b
0 1992 test 23
1 1993 test 24
2 1994 test 27
3 1992 test1 12
4 1993 test1 19
5 1994 test1 20
7 1993 test3 24
8 1994 test3 21
11 1994 test4 40
Upvotes: 1
Reputation: 34056
Consider below df
for example:
In [2063]: df = pd.DataFrame({'col1':[{"test":"23","test1":"12"}, {"test":"24","test1":"19","test3":"24"}, {"test":"27","test1":"20","test3":"21","test4":"40"}], 'col2':[1992, 1993, 1994]})
In [2064]: df
Out[2064]:
col1 col2
0 {'test': '23', 'test1': '12'} 1992
1 {'test': '24', 'test1': '19', 'test3': '24'} 1993
2 {'test': '27', 'test1': '20', 'test3': '21', '... 1994
You can use df.apply
with df.explode()
:
In [2085]: df.col1 = df.col1.apply(lambda x: list(x.items()))
In [2086]: df = df.explode('col1')
In [2091]: df[['col1_a', 'col1_b']] = pd.DataFrame(df.col1.tolist(), index=df.index)
In [2093]: df = df[['col1_a', 'col1_b', 'col2']]
In [2094]: df
Out[2094]:
col1_a col1_b col2
0 test 23 1992
0 test1 12 1992
1 test 24 1993
1 test1 19 1993
1 test3 24 1993
2 test 27 1994
2 test1 20 1994
2 test3 21 1994
2 test4 40 1994
Upvotes: 2