Reputation: 537
I have a pandas data frame with a lot of columns. I want to create a new dataframe with just two columns. The first column should contain all values which appear in a specific column of the original data frame. The second column should contain all other data of the original data frame where the value of the first column matches.
For example my input data frame is structured like this:
Name Menu City
0 Foo Burgers Burgers and Fries New York
1 Cheesy's Cheeseburgers New York
2 Buggy Burgers Insect Burgers London
3 Fry Guy Fries London
4 Beermania Beer Berlin
In code:
df = pd.DataFrame([["Foo Burgers", "Burgers and Fries", "New York"],
["Cheesy's", "Cheeseburgers", "New York"],
["Buggy Burgers", "Insect Burgers", "London"],
["Fry Guy", "Fries", "London"],
["Beermania", "Beer", "Munich"]], columns=["Name","Menu","City"])
How can I easily convert the data frame to the following goal structure?
City Restaurants
0 New York [{"Name": "Foo Burgers", "Menu": "Burgers and Fries"}, {"Name":"Cheesy's", "Menu": "Cheeseburgers"}]
1 London [{"Name": "Buggy Burgers", "Menu": "Insect Burgers"}, {"Name":"Fry Guy", "Menu": "Fries"}]
2 Munich [{'Name': 'Beermania', 'Menu': 'Beer'}]
In code:
goal_df = pd.DataFrame([["New York", [{"Name": "Foo Burgers", "Menu": "Burgers and Fries"}, {"Name":"Cheesy's", "Menu": "Cheeseburgers"}], ],
["London", [{"Name": "Buggy Burgers", "Menu": "Insect Burgers"}, {"Name":"Fry Guy", "Menu": "Fries"}], ],
["Munich", [{"Name": "Beermania", "Menu": "Beer"}], ]], columns=["City", "Restaurants"])
Upvotes: 0
Views: 305
Reputation: 150735
You can do a groupby().agg
with to_dict
:
(df.drop('City', axis=1).groupby(df['City'])
.apply(lambda x: x.to_dict(orient='records'))
.reset_index(name='Restaurants')
)
Output:
City Restaurants
0 London [{'Name': 'Buggy Burgers', 'Menu': 'Insect Bur...
1 Munich [{'Name': 'Beermania', 'Menu': 'Beer'}]
2 New York [{'Name': 'Foo Burgers', 'Menu': 'Burgers and ...
Upvotes: 1