Reputation: 73

List of Dictionaries to DataFrame

I have a data like this and I want the data to be written in a dataframe so that I can convert it directly into a csv file.

Data = 
[ {'event': 'User Clicked', 'properties': {'user_id': '123', 'page_visited': 'contact_us', etc},
{'event': 'User Clicked', 'properties': {'user_id': '456', 'page_visited': 'homepage', etc} , ...... 
{'event': 'User Clicked', 'properties': {'user_id': '789', 'page_visited': 'restaurant', etc}} ]

This is How I am able to access its values:

for item in list_of_dict_responses:
            print item['event']
            for key, value in item.items():
                if type(value) is dict:
                    for k, v in value.items():
                        print k,v

I want it in a dataframe where event is a column with value of User Clicked and properties is a another column with sub column of user_id, page_visited, contact_us and then respective values of sub column.

Upvotes: 1

Answers (2)

Haleemur Ali

Reputation: 28313

flatten the nested dictionaries & then just use the data frame constructor to create a data frame.

data = [ 
  {'event': 'User Clicked', 'properties': {'user_id': '123', 'page_visited': 'contact_us'}},
  {'event': 'User Clicked', 'properties': {'user_id': '456', 'page_visited': 'homepage'}},
  {'event': 'User Clicked', 'properties': {'user_id': '789', 'page_visited': 'restaurant'}} 
]

The flattened dictionary may be constructed in several ways. Here's 1 method using a generator that is generic & will work with arbitrary-depth nested dictionaries (or at least until it hits the max recursion depth)

def flatten(kv, prefix=[]):
    for k, v in kv.items():
        if isinstance(v, dict):
            yield from flatten(v, prefix+[str(k)])
        else:
            if prefix:
                yield '_'.join(prefix+[str(k)]), v
            else:
                yield str(k), v

Then using list comprehension to flatten all the records in data, construct the data frame

pd.DataFrame({k:v for k, v in flatten(kv)} for kv in data)
#Out
          event properties_page_visited properties_user_id
0  User Clicked              contact_us                123
1  User Clicked                homepage                456
2  User Clicked              restaurant                789

Upvotes: 2

jpp

Reputation: 164843

You have 2 options: either use a MultiIndex for columns, or add a prefix for data in properties. The former, in my opinion, is not appropriate here, since you don't have a "true" hierarchical columnar structure. The second level, for example, would be empty for event.

Implementing the second idea, you can restructure your list of dictionaries before feeding to pd.DataFrame. The syntax {**d1, **d2} is used to combine two dictionaries.

data_transformed = [{**{'event': d['event']},
                     **{f'properties_{k}': v for k, v in d['properties'].items()}} \
                    for d in Data]

res = pd.DataFrame(data_transformed)

print(res)

          event properties_page_visited properties_user_id
0  User Clicked              contact_us                123
1  User Clicked                homepage                456
2  User Clicked              restaurant                789

This also aids writing to and reading from CSV files, where a MultiIndex can be ambiguous.

Upvotes: 0

List of Dictionaries to DataFrame

Answers (2)

Related Questions