Giuseppe Cianci
Giuseppe Cianci

Reputation: 537

Grouping data frame by specific column value

I have a pandas data frame with a lot of columns. I want to create a new dataframe with just two columns. The first column should contain all values which appear in a specific column of the original data frame. The second column should contain all other data of the original data frame where the value of the first column matches.

For example my input data frame is structured like this:

    Name            Menu                City
0   Foo Burgers     Burgers and Fries   New York
1   Cheesy's        Cheeseburgers       New York
2   Buggy Burgers   Insect Burgers      London
3   Fry Guy         Fries               London
4   Beermania       Beer                Berlin

In code:

df = pd.DataFrame([["Foo Burgers", "Burgers and Fries", "New York"], 
                   ["Cheesy's", "Cheeseburgers", "New York"],
                   ["Buggy Burgers", "Insect Burgers", "London"],
                   ["Fry Guy", "Fries", "London"],
                   ["Beermania", "Beer", "Munich"]], columns=["Name","Menu","City"])

How can I easily convert the data frame to the following goal structure?

    City        Restaurants
0   New York    [{"Name": "Foo Burgers", "Menu": "Burgers and Fries"}, {"Name":"Cheesy's", "Menu": "Cheeseburgers"}]
1   London      [{"Name": "Buggy Burgers", "Menu": "Insect Burgers"}, {"Name":"Fry Guy", "Menu": "Fries"}]
2   Munich      [{'Name': 'Beermania', 'Menu': 'Beer'}]

In code:

goal_df = pd.DataFrame([["New York", [{"Name": "Foo Burgers", "Menu": "Burgers and Fries"}, {"Name":"Cheesy's", "Menu": "Cheeseburgers"}], ],
                        ["London", [{"Name": "Buggy Burgers", "Menu": "Insect Burgers"}, {"Name":"Fry Guy", "Menu": "Fries"}], ],
                        ["Munich", [{"Name": "Beermania", "Menu": "Beer"}], ]], columns=["City", "Restaurants"])

Upvotes: 0

Views: 305

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

You can do a groupby().agg with to_dict:

(df.drop('City', axis=1).groupby(df['City'])
   .apply(lambda x: x.to_dict(orient='records'))
   .reset_index(name='Restaurants')
)

Output:

       City                                        Restaurants
0    London  [{'Name': 'Buggy Burgers', 'Menu': 'Insect Bur...
1    Munich            [{'Name': 'Beermania', 'Menu': 'Beer'}]
2  New York  [{'Name': 'Foo Burgers', 'Menu': 'Burgers and ...

Upvotes: 1

Related Questions