Reputation: 123
I am trying to unstack a column in python but it isn't quite doing what I am expecting. My table (called df) looks similar to this:
station_id year Day1 Day2
210018 1916 4 7
1917 3 9
256700 1916 NaN 8
1917 6 9
I want to unstack by year so that all the days from a year per stationn are in one row. The two days from 1916 would then start first followed by the 2 days from 1917 for station 210018 and 256700.
An example would look like this:
station_id 1916 1917
210018 4 7 3 9
256700 NaN 8 6 9
I am trying to use this code:
df2=df.unstack(level='year')
df2.columns=df2.columns.swaplevel(0,1)
df2=df2.sort(axis=1)
I get an error that says AttributeError: 'Series' object has no attribute 'columns'
.
Any help would be appreciated.
Upvotes: 0
Views: 710
Reputation: 880329
You need to make year
an index before you call unstack:
try:
# for Python2
from cStringIO import StringIO
except ImportError:
# for Python3
from io import StringIO
import pandas as pd
text = '''\
station_id year Day1 Day2
210018 1916 4 7
210018 1917 3 9
256700 1916 NaN 8
256700 1917 6 9'''
df = pd.read_table(StringIO(text), sep='\s+')
df = df.set_index(['station_id', 'year'])
df2 = df.unstack(level='year')
df2.columns = df2.columns.swaplevel(0,1)
df2 = df2.sort(axis=1)
print(df2)
yields
year 1916 1917
Day1 Day2 Day1 Day2
station_id
210018 4 7 3 9
256700 NaN 8 6 9
whereas, if year
is a column, and not an index, then
df = pd.read_table(StringIO(text), sep='\s+')
df = df.set_index(['station_id'])
df2 = df.unstack(level='year')
df2.columns = df2.columns.swaplevel(0,1)
df2 = df2.sort(axis=1)
leads to AttributeError: 'Series' object has no attribute 'columns'
.
The level='year'
is ignored in df.unstack(level='year')
when df
does not have a index level named year
(or even, say, blah
):
In [102]: df
Out[102]:
year Day1 Day2
station_id
210018 1916 4 7
210018 1917 3 9
256700 1916 NaN 8
256700 1917 6 9
In [103]: df.unstack(level='blah')
Out[103]:
station_id
year 210018 1916
210018 1917
256700 1916
256700 1917
Day1 210018 4
210018 3
256700 NaN
256700 6
Day2 210018 7
210018 9
256700 8
256700 9
dtype: float64
This is the source of the surprising error.
Upvotes: 3