E. Zeytinci
E. Zeytinci

Reputation: 2643

How can I create a hierarchical json from rows in pandas?

I have a dataframe like that,

values = [
    [
        1, 1, 'Standard Room', 'adult', 'John Doe'
    ],
    [
        1, 1, 'Standard Room', 'adult', 'John Noakes'
    ],
    [
        1, 2, 'Standard Room', 'adult', 'Richard Miles'
    ],
    [
        2, 1, 'Standard Room', 'adult', 'John Smith'
    ],
    [
        2, 1, 'Standard Room', 'adult', 'Joe Public'
    ]
]

columns = [
    'booking_id',
    'room_id',
    'room_name',
    'guest_type',
    'guest_name'
]

df = pd.DataFrame.from_records(values, columns=columns)
print(df)

   booking_id  room_id      room_name guest_type     guest_name
0           1        1  Standard Room      adult       John Doe
1           1        1  Standard Room      adult    John Noakes
2           1        2  Standard Room      adult  Richard Miles
3           2        1  Standard Room      adult     John Smith
4           2        1  Standard Room      adult     Joe Public

I just want to create a json for each booking_id. And that json should contains room_id, guest_type and guest_name fields. I've been try something like that,

df.groupby(['booking_id', 'room_id', 'room_name']).apply(lambda x: x.to_dict(orient='records'))

But in this way, there are some duplicate values. For instance there are two same room_id in dictionaries,

[{'booking_id': 1,
  'room_id': 1,
  'room_name': 'Standard Room',
  'guest_type': 'adult',
  'guest_name': 'John Doe'},
 {'booking_id': 1,
  'room_id': 1,
  'room_name': 'Standard Room',
  'guest_type': 'adult',
  'guest_name': 'John Noakes'}]

But, as I said before, I want to create hierarchical json for each booking_id. Here is the expected output,

output_values = [
    [1, {
    'rooms': [
                {
                    'room_id': 1,
                    'room_name': 'Standard Room',
                    'guests': [
                        {
                            'guest_type': 'adult',
                            'guest_name': 'John Doe'
                        },
                        {
                            'guest_type': 'adult',
                            'guest_name': 'John Noakes'
                        }
                    ]
                },
                {
                    'room_id': 2,
                    'room_name': 'Standard Room',
                    'guests': [
                        {
                            'guest_type': 'adult',
                            'guest_name': 'Richard Miles'
                        }
                    ]
                }
            ]
        }
    ],
    [2, {
            'rooms': [
                {
                    'room_id': 1,
                    'room_name': 'Standard Room',
                    'guests': [
                        {
                            'guest_type': 'adult',
                            'guest_name': 'John Smith'
                        },
                        {
                            'guest_type': 'adult',
                            'guest_name': 'Joe Public'
                        }
                    ]
                }
            ]
        }
    ]
]

print(pd.DataFrame.from_records(output_values, columns=['booking_id', 'rooms']))

   booking_id                                              rooms
0           1  {'rooms': [{'room_id': 1, 'room_name': 'Standa...
1           2  {'rooms': [{'room_id': 1, 'room_name': 'Standa...

Any idea? Thanks in advance.

Upvotes: 1

Views: 95

Answers (2)

Alexandra Dudkina
Alexandra Dudkina

Reputation: 4462

Another solution with uncool iteration:

d = [{ int(df.loc[i]['booking_id']): {
    'rooms': [
      {
        'room_id' : int(room_id),
        'room_name' : df.loc[i]['room_name'],
        'guests' : [
          {
            'guest_type': df.loc[g]['guest_type'],
            'guest_name': df.loc[g]['guest_name']
          }
          for g in df[(df['booking_id'] == df.loc[i]['booking_id']) & (df['room_id'] == room_id)].index]
      } for room_id in set(df[df['booking_id'] == df.loc[i]['booking_id']]['room_id'].values) ]
    } for i in df.index }]

result = json.dumps(d, indent=4)

print(result)

Upvotes: 1

Henry Yik
Henry Yik

Reputation: 22503

A lot of reshaping are required including multiple groupby, apply and zip:

s = (df.groupby(["booking_id", "room_id", "room_name"])
       .apply(lambda d: pd.Series({"guests": [{k: v for k,v in zip(["guest_type", "guest_name"], i)}
                                              for i in d[["guest_type", "guest_name"]].values]}))
       .reset_index(["room_id", "room_name"]) )

res = [[g, {'rooms': i.to_dict("records")}] for g, i in s.groupby(level=0)]

import pprint

pprint.pprint(res)

[[1,
  {'rooms': [{'guests': [{'guest_name': 'John Doe', 
                          'guest_type': 'adult'},
                         {'guest_name': 'John Noakes',
                          'guest_type': 'adult'}],
              'room_id': 1,
              'room_name': 'Standard Room'},
             {'guests': [{'guest_name': 'Richard Miles',
                          'guest_type': 'adult'}],
              'room_id': 2,
              'room_name': 'Standard Room'}]}],
 [2,
  {'rooms': [{'guests': [{'guest_name': 'John Smith', 
                          'guest_type': 'adult'},
                         {'guest_name': 'Joe Public',
                          'guest_type': 'adult'}],
              'room_id': 1,
              'room_name': 'Standard Room'}]}]]

Upvotes: 2

Related Questions