user1584253
user1584253

Reputation: 1005

parsing a list of dictionaries in a pandas data frame rows

I have a list dictionaries in pandas dataframe column. I want to parse it and create new rows from it even though other column value repeat.

Here is the dataframe:

event_date  event_timestamp event_name  event_params
20191118    1.57401E+15 user_engagement [{'key': 'firebase_event_origin', 'value': {'string_value': 'auto', 'int_value': None, 'float_value': None, 'double_value': None}}, {'key': 'ga_session_number', 'value': {'string_value': None, 'int_value': 5, 'float_value': None, 'double_value': None}}, {'key': 'engagement_time_msec', 'value': {'string_value': None, 'int_value': 17167, 'float_value': None, 'double_value': None}}, {'key': 'firebase_screen_id', 'value': {'string_value': None, 'int_value': 9065232440298470924, 'float_value': None, 'double_value': None}}, {'key': 'ga_session_id', 'value': {'string_value': None, 'int_value': 1574005142, 'float_value': None, 'double_value': None}}, {'key': 'firebase_screen_class', 'value': {'string_value': 'SplashActivity', 'int_value': None, 'float_value': None, 'double_value': None}}, {'key': 'engaged_session_event', 'value': {'string_value': None, 'int_value': 1, 'float_value': None, 'double_value': None}}]
20191119    1.57401E+15 screen_view [{'key': 'firebase_previous_id', 'value': {'string_value': None, 'int_value': 9065232440298470924, 'float_value': None, 'double_value': None}}, {'key': 'firebase_event_origin', 'value': {'string_value': 'auto', 'int_value': None, 'float_value': None, 'double_value': None}}, {'key': 'ga_session_number', 'value': {'string_value': None, 'int_value': 5, 'float_value': None, 'double_value': None}}, {'key': 'firebase_screen_id', 'value': {'string_value': None, 'int_value': 9065232440298470925, 'float_value': None, 'double_value': None}}, {'key': 'firebase_previous_class', 'value': {'string_value': 'SplashActivity', 'int_value': None, 'float_value': None, 'double_value': None}}, {'key': 'ga_session_id', 'value': {'string_value': None, 'int_value': 1574005142, 'float_value': None, 'double_value': None}}, {'key': 'firebase_screen_class', 'value': {'string_value': 'AuthenticationActivity', 'int_value': None, 'float_value': None, 'double_value': None}}, {'key': 'engaged_session_event', 'value': {'string_value': None, 'int_value': 1, 'float_value': None, 'double_value': None}}]
20191120    1.57401E+15 user_engagement [{'key': 'firebase_event_origin', 'value': {'string_value': 'auto', 'int_value': None, 'float_value': None, 'double_value': None}}, {'key': 'ga_session_number', 'value': {'string_value': None, 'int_value': 5, 'float_value': None, 'double_value': None}}, {'key': 'engagement_time_msec', 'value': {'string_value': None, 'int_value': 13271, 'float_value': None, 'double_value': None}}, {'key': 'firebase_screen_id', 'value': {'string_value': None, 'int_value': 9065232440298470925, 'float_value': None, 'double_value': None}}, {'key': 'ga_session_id', 'value': {'string_value': None, 'int_value': 1574005142, 'float_value': None, 'double_value': None}}, {'key': 'firebase_screen_class', 'value': {'string_value': 'AuthenticationActivity', 'int_value': None, 'float_value': None, 'double_value': None}}, {'key': 'engaged_session_event', 'value': {'string_value': None, 'int_value': 1, 'float_value': None, 'double_value': None}}]

This is what I want

event_date  event_timestamp event_name      key                    value
20191118    1.57401E+15     user_engagement firebase_event_origin   auto
20191118    1.57401E+15     user_engagement ga_session_number        5
20191118    1.57401E+15     user_engagement engagement_time_msec    17167
20191119    1.57401E+15     screen_view     firebase_previous_id    9.06523E+18
20191119    1.57401E+15     screen_view     engaged_session_event    1

This is what I have tried:

pd.DataFrame(data['event_params'].apply(ast.literal_eval).values.tolist()) \
        .stack() \
        .reset_index(level=0,drop=True) \
        .reset_index()

It gives me following output:

index     0
0        {'key': 'firebase_event_origin', 'value': {'st...
1       {'key': 'ga_session_number', 'value': {'string...
2       {'key': 'engagement_time_msec', 'value': {'str...

How do I parse it more populating columns 'key' and 'value'. Also, making other column values duplicate. Kindly assist me.

UPDATE: Solution tried

Solution tried as per @bharath

Upvotes: 1

Views: 2376

Answers (2)

Bharath_Raja
Bharath_Raja

Reputation: 642

You have to explode your Data Frame First using pandas.series.explode() and then write a couple of for loops to get the expected result. Here is the Answer.

import pandas as pd

d = {'event_date': [1, 2], 'event_name': [3, 4] ,'event_params': [[{'key': 'firebase_event_origin', 'value': {'string_value': 'auto', 'int_value': None, 'float_value': None, 'double_value': None}}, {'key': 'ga_session_number', 'value': {'string_value': None, 'int_value': 5, 'float_value': None, 'double_value': None}}, {'key': 'engagement_time_msec', 'value': {'string_value': None, 'int_value': 17167, 'float_value': None, 'double_value': None}}, {'key': 'firebase_screen_id', 'value': {'string_value': None, 'int_value': 9065232440298470924, 'float_value': None, 'double_value': None}}, {'key': 'ga_session_id', 'value': {'string_value': None, 'int_value': 1574005142, 'float_value': None, 'double_value': None}}, {'key': 'firebase_screen_class', 'value': {'string_value': 'SplashActivity', 'int_value': None, 'float_value': None, 'double_value': None}}, {'key': 'engaged_session_event', 'value': {'string_value': None, 'int_value': 1, 'float_value': None, 'double_value': None}}], [{'key': 'firebase_previous_id', 'value': {'string_value': None, 'int_value': 9065232440298470924, 'float_value': None, 'double_value': None}}, {'key': 'firebase_event_origin', 'value': {'string_value': 'auto', 'int_value': None, 'float_value': None, 'double_value': None}}, {'key': 'ga_session_number', 'value': {'string_value': None, 'int_value': 5, 'float_value': None, 'double_value': None}}, {'key': 'firebase_screen_id', 'value': {'string_value': None, 'int_value': 9065232440298470925, 'float_value': None, 'double_value': None}}, {'key': 'firebase_previous_class', 'value': {'string_value': 'SplashActivity', 'int_value': None, 'float_value': None, 'double_value': None}}, {'key': 'ga_session_id', 'value': {'string_value': None, 'int_value': 1574005142, 'float_value': None, 'double_value': None}}, {'key': 'firebase_screen_class', 'value': {'string_value': 'AuthenticationActivity', 'int_value': None, 'float_value': None, 'double_value': None}}, {'key': 'engaged_session_event', 'value': {'string_value': None, 'int_value': 1, 'float_value': None, 'double_value': None}}]]}


df = pd.DataFrame(d)

df = df.explode('event_params').reset_index(drop = True)

df['key'] = None

for i in range(len(df)):
    df.loc[i, 'key'] = df.loc[i, 'event_params']['key']

df['value'] = None

for i in range(len(df)):
    for k in df.loc[i, 'event_params']['value']:
        if df.loc[i, 'event_params']['value'][k]!=None:
            df.loc[i, 'value'] = df.loc[i, 'event_params']['value'][k]


df.drop(columns = 'event_params', inplace = True)

Upvotes: 2

Chris
Chris

Reputation: 29742

Using pandas.DataFrame.explode:

new_df = df.explode('event_params')
tmp = pd.DataFrame(list(new_df.pop('event_params')))
new_df['value'] = tmp['value'].apply(lambda x:next(i for i in x.values() if i is not None ))

Output:

   event_date  event_timestamp       event_name  value
0    20191118     1.574010e+15  user_engagement   auto
0    20191118     1.574010e+15  user_engagement   auto
0    20191118     1.574010e+15  user_engagement   auto
0    20191118     1.574010e+15  user_engagement   auto
0    20191118     1.574010e+15  user_engagement   auto
0    20191118     1.574010e+15  user_engagement   auto
0    20191118     1.574010e+15  user_engagement   auto
1    20191119     1.574010e+15      screen_view      5
1    20191119     1.574010e+15      screen_view      5
1    20191119     1.574010e+15      screen_view      5
1    20191119     1.574010e+15      screen_view      5
1    20191119     1.574010e+15      screen_view      5
1    20191119     1.574010e+15      screen_view      5
1    20191119     1.574010e+15      screen_view      5
1    20191119     1.574010e+15      screen_view      5
2    20191120     1.574010e+15  user_engagement  17167
2    20191120     1.574010e+15  user_engagement  17167
2    20191120     1.574010e+15  user_engagement  17167
2    20191120     1.574010e+15  user_engagement  17167
2    20191120     1.574010e+15  user_engagement  17167
2    20191120     1.574010e+15  user_engagement  17167
2    20191120     1.574010e+15  user_engagement  17167

Upvotes: 0

Related Questions