Reputation: 2643
I have a dataframe like that,
values = [
[
1, 1, 'Standard Room', 'adult', 'John Doe'
],
[
1, 1, 'Standard Room', 'adult', 'John Noakes'
],
[
1, 2, 'Standard Room', 'adult', 'Richard Miles'
],
[
2, 1, 'Standard Room', 'adult', 'John Smith'
],
[
2, 1, 'Standard Room', 'adult', 'Joe Public'
]
]
columns = [
'booking_id',
'room_id',
'room_name',
'guest_type',
'guest_name'
]
df = pd.DataFrame.from_records(values, columns=columns)
print(df)
booking_id room_id room_name guest_type guest_name
0 1 1 Standard Room adult John Doe
1 1 1 Standard Room adult John Noakes
2 1 2 Standard Room adult Richard Miles
3 2 1 Standard Room adult John Smith
4 2 1 Standard Room adult Joe Public
I just want to create a json for each booking_id
. And that json should contains room_id
, guest_type
and guest_name
fields. I've been try something like that,
df.groupby(['booking_id', 'room_id', 'room_name']).apply(lambda x: x.to_dict(orient='records'))
But in this way, there are some duplicate values. For instance there are two same room_id
in dictionaries,
[{'booking_id': 1,
'room_id': 1,
'room_name': 'Standard Room',
'guest_type': 'adult',
'guest_name': 'John Doe'},
{'booking_id': 1,
'room_id': 1,
'room_name': 'Standard Room',
'guest_type': 'adult',
'guest_name': 'John Noakes'}]
But, as I said before, I want to create hierarchical json for each booking_id
. Here is the expected output,
output_values = [
[1, {
'rooms': [
{
'room_id': 1,
'room_name': 'Standard Room',
'guests': [
{
'guest_type': 'adult',
'guest_name': 'John Doe'
},
{
'guest_type': 'adult',
'guest_name': 'John Noakes'
}
]
},
{
'room_id': 2,
'room_name': 'Standard Room',
'guests': [
{
'guest_type': 'adult',
'guest_name': 'Richard Miles'
}
]
}
]
}
],
[2, {
'rooms': [
{
'room_id': 1,
'room_name': 'Standard Room',
'guests': [
{
'guest_type': 'adult',
'guest_name': 'John Smith'
},
{
'guest_type': 'adult',
'guest_name': 'Joe Public'
}
]
}
]
}
]
]
print(pd.DataFrame.from_records(output_values, columns=['booking_id', 'rooms']))
booking_id rooms
0 1 {'rooms': [{'room_id': 1, 'room_name': 'Standa...
1 2 {'rooms': [{'room_id': 1, 'room_name': 'Standa...
Any idea? Thanks in advance.
Upvotes: 1
Views: 95
Reputation: 4462
Another solution with uncool iteration:
d = [{ int(df.loc[i]['booking_id']): {
'rooms': [
{
'room_id' : int(room_id),
'room_name' : df.loc[i]['room_name'],
'guests' : [
{
'guest_type': df.loc[g]['guest_type'],
'guest_name': df.loc[g]['guest_name']
}
for g in df[(df['booking_id'] == df.loc[i]['booking_id']) & (df['room_id'] == room_id)].index]
} for room_id in set(df[df['booking_id'] == df.loc[i]['booking_id']]['room_id'].values) ]
} for i in df.index }]
result = json.dumps(d, indent=4)
print(result)
Upvotes: 1
Reputation: 22503
A lot of reshaping are required including multiple groupby
, apply
and zip
:
s = (df.groupby(["booking_id", "room_id", "room_name"])
.apply(lambda d: pd.Series({"guests": [{k: v for k,v in zip(["guest_type", "guest_name"], i)}
for i in d[["guest_type", "guest_name"]].values]}))
.reset_index(["room_id", "room_name"]) )
res = [[g, {'rooms': i.to_dict("records")}] for g, i in s.groupby(level=0)]
import pprint
pprint.pprint(res)
[[1,
{'rooms': [{'guests': [{'guest_name': 'John Doe',
'guest_type': 'adult'},
{'guest_name': 'John Noakes',
'guest_type': 'adult'}],
'room_id': 1,
'room_name': 'Standard Room'},
{'guests': [{'guest_name': 'Richard Miles',
'guest_type': 'adult'}],
'room_id': 2,
'room_name': 'Standard Room'}]}],
[2,
{'rooms': [{'guests': [{'guest_name': 'John Smith',
'guest_type': 'adult'},
{'guest_name': 'Joe Public',
'guest_type': 'adult'}],
'room_id': 1,
'room_name': 'Standard Room'}]}]]
Upvotes: 2