How do I retain the column name used in my group by with Pandas

Question

I have two data frames. I would like to use group by on the second data frame and then merge the two together on the Company Name column. The issue is that with my group by statement I loose the Company Name column.

import pandas as pd

df1 = pd.DataFrame(
    {
        'Company Name': ['Google','Google','Microsoft','Microsoft','Amazon','Amazon'],
        'Location': ['Somewhere','Somewhere','Somewhere','Somewhere','Somewhere','Somewhere'],
    }
)

df = pd.DataFrame(
    {
        'Company Name': ['Google','Google','Microsoft','Microsoft','Amazon','Amazon'],
        'Sales': [12345,12345,12345,12345,12345,12345],
        'Company Type': ['Software','Software','Software','Software','Software','Software']
    }
)
df = df.groupby(['Company Name']).sum()

pd.merge(df1,df,how="inner",on="Company Name")

I get an error message when merging due to df not having a Company Name column to perform the join.

U13-Forward · Accepted Answer

Replace this line:

df = df.groupby(['Company Name']).sum()

With:

df = df.groupby('Company Name', as_index=False).sum()

Then your code will work as expected, and return:

  Company Name   Location  Sales
0       Google  Somewhere  24690
1       Google  Somewhere  24690
2    Microsoft  Somewhere  24690
3    Microsoft  Somewhere  24690
4       Amazon  Somewhere  24690
5       Amazon  Somewhere  24690

How do I retain the column name used in my group by with Pandas

Answers (1)

Related Questions