Anselmo
Anselmo

Reputation: 75

DataFrame Pandas - How can I split a list of dictionary from each row to separated columns?

I have the DataFrame below with 2 columns, and one of the columns is a list of dictionary inside an list of dictionary. I would like to split/separate this column in several columns.

import pandas as pd

USERNAME = ['root', 'user1', 'user2','user3']
test_data = '[{"conjunction":"and","expressions":[{"_actualOperator":"contains","_actualValue":"LBD","attr":"displayName","op":"contains","value":"LBD"}],"name":"test_Event","editable":true}]'
test_data2 = '[{"conjunction":"and","expressions":[{"_actualOperator":"not_contains","_actualValue":"AAA","attr":"Event","op":"contains","value":"LBD"}],"name":"test_Event","editable":true}]'
test_data3 = '[{"conjunction":"and","expressions":[{"_actualOperator":"exclude","_actualValue":"BBB","attr":"Event","op":"contains","value":"LBD"}],"name":"test_Event","editable":true}]'
test_data4 = '[{"conjunction":"and","expressions":[{"_actualOperator":"adding","_actualValue":"CASA","attr":"displayName","op":"contains","value":"LBD"}],"name":"test_Event","editable":true}]'
VALUE_STRING = [test_data, test_data2, test_data3, test_data4]

data = {'USERNAME': ['root', 'user1', 'user2','user3'], 'VALUE_STRING' : VALUE_STRING}
df = pd.DataFrame(data)
df

USERNAME    VALUE_STRING
root        [{"conjunction":"and","expressions":[{"_actual...
user1       [{"conjunction":"and","expressions":[{"_actual...
user2       [{"conjunction":"and","expressions":[{"_actual...
user2       [{"conjunction":"and","expressions":[{"_actual...

And I expected a result like this:

df_expected = pd.DataFrame({'USERNAME': ['root', 'user1', 'user2','user3'], 
                            '_actualOperator':['contains','not_contains','exclude','adding'],
                           '_actualValue':['LBD','AAA','BBB','CASA'],
                           'attr':['displayName','Event','Event','displayName']})
df_expected

USERNAME    _actualOperator     _actualValue    attr
root        contains            LBD             displayName
user1       not_contains        AAA             Event
user2       exclude             BBB             Event
user3       adding              CASA            displayName

Upvotes: 0

Views: 81

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

It seems like the column VALUE_STRING contains json data, in that case we can parse the json data using loads method of json module, then extract the dictionaries associated with the key expressions from each row, create a new dataframe from these dictionaries and join back with USERNAME column

import json

s = df['VALUE_STRING'].map(json.loads)\
     .str[0].str['expressions'].str[0]

exp = pd.DataFrame([*s], index=s.index)
df_out = df[['USERNAME']].join(exp).drop(['op', 'value'], axis=1)

Alternative approach with pandas json_normalize method

s = df['VALUE_STRING'].map(json.loads).str[0]
exp = pd.json_normalize(s, 'expressions')
df_out = df[['USERNAME']].join(exp).drop(['op', 'value'], axis=1)

print(df_out)

  USERNAME _actualOperator _actualValue         attr
0     root        contains          LBD  displayName
1    user1    not_contains          AAA        Event
2    user2         exclude          BBB        Event
3    user3          adding         CASA  displayName

Upvotes: 1

Related Questions