Turning a dictionary inside of a dataframe into a new dataframe and selecting rows of it

Question

I have a dataframe that looks like this:

    code   size type  type_tops
0   123    5.11 A .   [dictionary]
1   345    5.5  B .   [dictionary]
2   543    6.2  B .   [dictionary]

Those dictionaries look like this:

{'size': {6640: 6.2, 10481: 6.4, 6585: 6.1}, 'Speed': {6640: 119685000.0, 10481: 145793000.0, 6585: 200021000.0}, 'type': {6640: '62.0', 10481: '62.0', 6585: '62.0'}, 'name': {6640: 'John', 10481: 'Mark', 6585: 'Weasley'}

I want to create a new column called "highest_guy" and another called "highest_guy_size". Each row has a different dictionary. Let's suppose that code 123 has this dictionary above. The first row should look like this:

    code   size type  type_tops       highest_guy    highest_guy_size
0   123    5.11 A .   [dictionary]    Mark           6.4

I suppose I would need to turn that dict into a dataframe and select the highest guy and his size. I could do that using a for loop, something like this:

for i in df.index:
    tops=pd.DataFrame(df['type_tops][i].sort_values(['size'].reset_index())
    df[highest_guy]=tops['name'][0]
    df['highest_guy_size']=tops['size][0]

But is there a way to make it without a for?

Simon · Accepted Answer

You can do this using a groupby and apply.

Theres no need to turn each dict into a separate dataframe - you can just get the id of the tallest individual and use that as a key to get values from the nested name and size dicts:

import pandas as pd

df = pd.DataFrame({'code': [123, 345, 543],
                   'size': [5.11, 5.5, 6.2],
                   'type': ['A', 'B', 'B'],
                   'type_tops': [
                            {'size': {6640: 6.2, 10481: 6.4, 6585: 6.1},
                             'Speed': {6640: 119685000.0, 10481: 145793000.0, 6585: 200021000.0},
                             'type': {6640: '62.0', 10481: '62.0', 6585: '62.0'},
                             'name': {6640: 'John', 10481: 'Mark', 6585: 'Weasley'}},
                            {'size': {6640: 6.2, 10481: 6.4, 6585: 6.9},
                             'Speed': {6640: 119685000.0, 10481: 145793000.0, 6585: 200021000.0},
                             'type': {6640: '62.0', 10481: '62.0', 6585: '62.0'},
                             'name': {6640: 'John', 10481: 'Mark', 6585: 'Weasley'}},
                            {'size': {6640: 6.7, 10481: 6.4, 6585: 6.1},
                             'Speed': {6640: 119685000.0, 10481: 145793000.0, 6585: 200021000.0},
                             'type': {6640: '62.0', 10481: '62.0', 6585: '62.0'},
                             'name': {6640: 'John', 10481: 'Mark', 6585: 'Weasley'}}
                           ]
                   })


def extract_vals(df):
    cur_df = df

    # get the dictionary
    cur_dict = dict(cur_df['type_tops'].item())

    # get the key/id of the tallest individual
    highest_id = max(cur_dict['size'], key=lambda k: cur_dict['size'][k])

    # use id to get their name and size
    highest_guy_size = cur_dict['size'][highest_id]
    highest_guy = cur_dict['name'][highest_id]

    cur_df['highest_guy'] = highest_guy
    cur_df['highest_guy_size'] = highest_guy_size

    return cur_df


new = df.groupby('code').apply(lambda x: extract_vals(x))

That will give you an output dataframe that looks like this:

Turning a dictionary inside of a dataframe into a new dataframe and selecting rows of it

Answers (2)

Related Questions