Reputation: 3607
When stacking a pandas DataFrame
, a Series
is returned. Normally after I stack a DataFrame
, I convert it back into a DataFrame
. However, the default names coming from the stacked data make renaming the columns a bit hacky. What I'm looking for is an easier/built-in way to give columns sensible names after stacking.
E.g., for the following DataFrame
:
In [64]: df = pd.DataFrame({'id':[1,2,3],
...: 'date':['2015-09-31']*3,
...: 'value':[100, 95, 42],
...: 'value2':[200, 57, 27]}).set_index(['id','date'])
In [65]: df
Out[65]:
value value2
id date
1 2015-09-31 100 200
2 2015-09-31 95 57
3 2015-09-31 42 27
I stack and convert it back to a DataFrame
like so:
In [68]: df.stack().reset_index()
Out[68]:
id date level_2 0
0 1 2015-09-31 value 100
1 1 2015-09-31 value2 200
2 2 2015-09-31 value 95
3 2 2015-09-31 value2 57
4 3 2015-09-31 value 42
5 3 2015-09-31 value2 27
So in order to name these columns appropriately I would need to do something like this:
In [72]: stacked = df.stack()
In [73]: stacked
Out[73]:
id date
1 2015-09-31 value 100
value2 200
2 2015-09-31 value 95
value2 57
3 2015-09-31 value 42
value2 27
dtype: int64
In [74]: stacked.index.set_names('var_name', level=len(stacked.index.names)-1, inplace=True)
In [88]: stacked.reset_index().rename(columns={0:'value'})
Out[88]:
id date var_name value
0 1 2015-09-31 value 100
1 1 2015-09-31 value2 200
2 2 2015-09-31 value 95
3 2 2015-09-31 value2 57
4 3 2015-09-31 value 42
5 3 2015-09-31 value2 27
Ideally, the solution would look something like this:
df.stack(new_index_name='var_name', new_col_name='value')
But looking at the docs it doesn't look like stack
takes any such arguments. Is there an easier/built-in way in pandas to deal with this workflow?
Upvotes: 40
Views: 35283
Reputation: 5911
To avoid a phantom column name when calling stack
, just rename the column axis beforehand:
df = pd.DataFrame({col: range(3) for col in list("ABC")})
df.rename_axis(columns="lol_goodbye_columns").stack()
lol_goodbye_columns
0 A 0
B 0
C 0
1 A 1
B 1
C 1
2 A 2
B 2
C 2
dtype: int64
Upvotes: 1
Reputation: 896
Why not something like this?
Sometimes melt
is great,
but sometimes you want to keep your index,
and/or you want to have an index on that new column.
This is like @krassowski's answer,
but it doesn't require you to know the names of df's indices in advance.
df.stack().rename_axis([*df.index.names, "var_name"]).rename("value")
Upvotes: 4
Reputation: 15379
A pipe-ing friendly alternative to chrisb's answer:
df.stack().rename_axis(['id', 'date', 'var_name']).rename('value').reset_index()
And if explicit is better than implicit:
(
df
.stack()
.rename_axis(index={'id': 'id', 'date': 'date', None: 'var_name'})
.rename('value')
.reset_index()
)
When using the dict mapper, you can skip the names which should stay the same:
df.stack().rename_axis(index={None: 'var_name'}).rename('value').reset_index()
Upvotes: 12
Reputation: 879869
pd.melt
is often useful for converting DataFrames from "wide" to "long" format. You could use pd.melt
here if you convert the id
and date
index levels to columns first:
In [56]: pd.melt(df.reset_index(), id_vars=['id', 'date'], value_vars=['value', 'value2'], var_name='var_name', value_name='value')
Out[56]:
id date var_name value
0 1 2015-09-31 value 100
1 2 2015-09-31 value 95
2 3 2015-09-31 value 42
3 1 2015-09-31 value2 200
4 2 2015-09-31 value2 57
5 3 2015-09-31 value2 27
Upvotes: 26
Reputation: 52266
So here's one way that you may find a bit cleaner, using the fact that columns
and Series
can also carry names.
In [45]: df
Out[45]:
value value2
id date
1 2015-09-31 100 200
2 2015-09-31 95 57
3 2015-09-31 42 27
In [46]: df.columns.name = 'var_name'
In [47]: s = df.stack()
In [48]: s.name = 'value'
In [49]: s.reset_index()
Out[49]:
id date var_name value
0 1 2015-09-31 value 100
1 1 2015-09-31 value2 200
2 2 2015-09-31 value 95
3 2 2015-09-31 value2 57
4 3 2015-09-31 value 42
5 3 2015-09-31 value2 27
Upvotes: 32