Reputation: 73
I have a data like this and I want the data to be written in a dataframe so that I can convert it directly into a csv file.
Data =
[ {'event': 'User Clicked', 'properties': {'user_id': '123', 'page_visited': 'contact_us', etc},
{'event': 'User Clicked', 'properties': {'user_id': '456', 'page_visited': 'homepage', etc} , ......
{'event': 'User Clicked', 'properties': {'user_id': '789', 'page_visited': 'restaurant', etc}} ]
This is How I am able to access its values:
for item in list_of_dict_responses:
print item['event']
for key, value in item.items():
if type(value) is dict:
for k, v in value.items():
print k,v
I want it in a dataframe where event is a column with value of User Clicked and properties is a another column with sub column of user_id, page_visited, contact_us
and then respective values of sub column.
Upvotes: 1
Views: 2287
Reputation: 28233
flatten the nested dictionaries & then just use the data frame constructor to create a data frame.
data = [
{'event': 'User Clicked', 'properties': {'user_id': '123', 'page_visited': 'contact_us'}},
{'event': 'User Clicked', 'properties': {'user_id': '456', 'page_visited': 'homepage'}},
{'event': 'User Clicked', 'properties': {'user_id': '789', 'page_visited': 'restaurant'}}
]
The flattened dictionary may be constructed in several ways. Here's 1 method using a generator that is generic & will work with arbitrary-depth nested dictionaries (or at least until it hits the max recursion depth)
def flatten(kv, prefix=[]):
for k, v in kv.items():
if isinstance(v, dict):
yield from flatten(v, prefix+[str(k)])
else:
if prefix:
yield '_'.join(prefix+[str(k)]), v
else:
yield str(k), v
Then using list comprehension to flatten all the records in data
, construct the data frame
pd.DataFrame({k:v for k, v in flatten(kv)} for kv in data)
#Out
event properties_page_visited properties_user_id
0 User Clicked contact_us 123
1 User Clicked homepage 456
2 User Clicked restaurant 789
Upvotes: 2
Reputation: 164623
You have 2 options: either use a MultiIndex
for columns, or add a prefix for data in properties
. The former, in my opinion, is not appropriate here, since you don't have a "true" hierarchical columnar structure. The second level, for example, would be empty for event
.
Implementing the second idea, you can restructure your list of dictionaries before feeding to pd.DataFrame
. The syntax {**d1, **d2}
is used to combine two dictionaries.
data_transformed = [{**{'event': d['event']},
**{f'properties_{k}': v for k, v in d['properties'].items()}} \
for d in Data]
res = pd.DataFrame(data_transformed)
print(res)
event properties_page_visited properties_user_id
0 User Clicked contact_us 123
1 User Clicked homepage 456
2 User Clicked restaurant 789
This also aids writing to and reading from CSV files, where a MultiIndex
can be ambiguous.
Upvotes: 0