Reputation: 9213
I have a Pandas
dataframe
that shows how much money people spent over the month of January and February. I would like to use a groupby
function to group the months by person, but my code is yielding a DataFrameGroupBy object
rather than an actual dataframe. I also have a column for gender that I would just like to leave in place.
code:
import pandas as pd
df = pd.DataFrame(data=[['Person A', 5, 21, 'Male'], ['Person B', 15, 3, 'Female']], columns=['Names', 'Jan', 'Feb', 'Gender'])
print df.groupby(['Names', 'Jan', 'Feb'])
output:
<pandas.core.groupby.DataFrameGroupBy object at 0x020D4470>
Starting dataframe:
Names Jan Feb Gender
0 Person A 5 21 Male
1 Person B 15 3 Female
Desired Output:
Names Value Gender
0 Person A - Jan 5 Male
1 Person A - Feb 21 Male
2 Person B - Jan 15 Female
3 Person B - Feb 3 Female
Upvotes: 1
Views: 192
Reputation: 7997
Another solution using stack.
df_out = df.set_index(['Names']).stack().to_frame().reset_index()
df_out.columns = ['Names','month','value']
Edit
This should also work:
stack_df = df.set_index(['Names', 'Gender']).stack().to_frame().reset_index()
stack_df.columns = ['Names','Gender','Month', 'Value']
Upvotes: 1
Reputation: 862441
You can use melt
with sort_values
, then concanecate columns and last drop
column variable
:
df1 = pd.melt(df, id_vars='Names').sort_values('Names')
df1['Names'] = df1['Names'] + '- ' + df1['variable']
df1 = df1.drop('variable', axis=1)
print df1
Names value
0 Person A- Jan 5
2 Person A- Feb 21
1 Person B- Jan 15
3 Person B- Feb 3
Another one line solution with assign
:
print pd.melt(df, id_vars='Names').sort_values('Names')
.assign(Names = lambda x: x['Names'] + '- ' + x['variable'])
.drop('variable', axis=1)
Names value
0 Person A- Jan 5
2 Person A- Feb 21
1 Person B- Jan 15
3 Person B- Feb 3
EDIT:
You can add new column to parameter id_vars
:
df1 = pd.melt(df, id_vars=['Names', 'Gender']).sort_values('Names')
df1['Names'] = df1['Names'] + '- ' + df1['variable']
df1 = df1.drop('variable', axis=1)
df1 = df1[['Names','value','Gender']]
print df1
Names value Gender
0 Person A- Jan 5 Male
2 Person A- Feb 21 Male
1 Person B- Jan 15 Female
3 Person B- Feb 3 Female
One line solution, if you need reorder columns use reindex_axis
:
print pd.melt(df, id_vars=['Names', 'Gender'])
.sort_values('Names')
.assign(Names = lambda x: x['Names'] + '- ' + x['variable'])
.drop('variable', axis=1)
.reindex_axis(['Names','value','Gender'], axis=1)
Names value Gender
0 Person A- Jan 5 Male
2 Person A- Feb 21 Male
1 Person B- Jan 15 Female
3 Person B- Feb 3 Female
Upvotes: 3