user2242044
user2242044

Reputation: 9213

Groupby in Pandas yields Groupby Object rather than Dataframe

I have a Pandas dataframe that shows how much money people spent over the month of January and February. I would like to use a groupby function to group the months by person, but my code is yielding a DataFrameGroupBy object rather than an actual dataframe. I also have a column for gender that I would just like to leave in place.

code:

import pandas as pd
df = pd.DataFrame(data=[['Person A', 5, 21, 'Male'], ['Person B', 15, 3, 'Female']], columns=['Names', 'Jan', 'Feb', 'Gender'])
print df.groupby(['Names', 'Jan', 'Feb'])

output:

<pandas.core.groupby.DataFrameGroupBy object at 0x020D4470>

Starting dataframe:

      Names  Jan  Feb  Gender
0  Person A    5   21    Male
1  Person B   15    3    Female

Desired Output:

            Names  Value    Gender
0  Person A - Jan      5     Male
1  Person A - Feb     21     Male
2  Person B - Jan     15     Female
3  Person B - Feb      3     Female

Upvotes: 1

Views: 192

Answers (2)

flyingmeatball
flyingmeatball

Reputation: 7997

Another solution using stack.

df_out = df.set_index(['Names']).stack().to_frame().reset_index()
df_out.columns = ['Names','month','value']

Edit

This should also work:

stack_df = df.set_index(['Names', 'Gender']).stack().to_frame().reset_index()
stack_df.columns = ['Names','Gender','Month', 'Value']

Upvotes: 1

jezrael
jezrael

Reputation: 862441

You can use melt with sort_values, then concanecate columns and last drop column variable:

df1 = pd.melt(df, id_vars='Names').sort_values('Names')
df1['Names'] = df1['Names'] + '- ' + df1['variable']
df1 = df1.drop('variable', axis=1)
print df1
           Names  value
0  Person A- Jan      5
2  Person A- Feb     21
1  Person B- Jan     15
3  Person B- Feb      3

Another one line solution with assign:

print pd.melt(df, id_vars='Names').sort_values('Names')
        .assign(Names = lambda x: x['Names'] + '- ' + x['variable'])
        .drop('variable', axis=1)

           Names  value
0  Person A- Jan      5
2  Person A- Feb     21
1  Person B- Jan     15
3  Person B- Feb      3

EDIT:

You can add new column to parameter id_vars:

df1 = pd.melt(df, id_vars=['Names', 'Gender']).sort_values('Names')
df1['Names'] = df1['Names'] + '- ' + df1['variable']
df1 = df1.drop('variable', axis=1)
df1 = df1[['Names','value','Gender']]
print df1
           Names  value  Gender
0  Person A- Jan      5    Male
2  Person A- Feb     21    Male
1  Person B- Jan     15  Female
3  Person B- Feb      3  Female

One line solution, if you need reorder columns use reindex_axis:

print pd.melt(df, id_vars=['Names', 'Gender'])
        .sort_values('Names')
        .assign(Names = lambda x: x['Names'] + '- ' + x['variable'])
        .drop('variable', axis=1)
        .reindex_axis(['Names','value','Gender'], axis=1)

           Names  value  Gender
0  Person A- Jan      5    Male
2  Person A- Feb     21    Male
1  Person B- Jan     15  Female
3  Person B- Feb      3  Female

Upvotes: 3

Related Questions