spotter
spotter

Reputation: 123

Using Unstack in Python

I am trying to unstack a column in python but it isn't quite doing what I am expecting. My table (called df) looks similar to this:

station_id   year     Day1   Day2 
 210018       1916      4        7
              1917      3        9 
 256700       1916     NaN       8
              1917      6        9

I want to unstack by year so that all the days from a year per stationn are in one row. The two days from 1916 would then start first followed by the 2 days from 1917 for station 210018 and 256700.

An example would look like this:

station_id            1916       1917
210018                4   7      3  9 
256700                NaN  8     6   9   

I am trying to use this code:

df2=df.unstack(level='year')
df2.columns=df2.columns.swaplevel(0,1)
df2=df2.sort(axis=1)

I get an error that says AttributeError: 'Series' object has no attribute 'columns'.

Any help would be appreciated.

Upvotes: 0

Views: 710

Answers (1)

unutbu
unutbu

Reputation: 880329

You need to make year an index before you call unstack:

try:
    # for Python2
    from cStringIO import StringIO 
except ImportError:
    # for Python3
    from io import StringIO

import pandas as pd


text = '''\
station_id   year     Day1   Day2 
 210018       1916      4        7
 210018       1917      3        9 
 256700       1916     NaN       8
 256700       1917      6        9'''

df = pd.read_table(StringIO(text), sep='\s+')
df = df.set_index(['station_id', 'year'])
df2 = df.unstack(level='year')
df2.columns = df2.columns.swaplevel(0,1)
df2 = df2.sort(axis=1)
print(df2)

yields

year       1916      1917     
           Day1 Day2 Day1 Day2
station_id                    
210018        4    7    3    9
256700      NaN    8    6    9

whereas, if year is a column, and not an index, then

df = pd.read_table(StringIO(text), sep='\s+')
df = df.set_index(['station_id'])   
df2 = df.unstack(level='year')
df2.columns = df2.columns.swaplevel(0,1)
df2 = df2.sort(axis=1)

leads to AttributeError: 'Series' object has no attribute 'columns'.


The level='year' is ignored in df.unstack(level='year') when df does not have a index level named year (or even, say, blah):

In [102]: df
Out[102]: 
            year  Day1  Day2
station_id                  
210018      1916     4     7
210018      1917     3     9
256700      1916   NaN     8
256700      1917     6     9

In [103]: df.unstack(level='blah')
Out[103]: 
      station_id
year  210018        1916
      210018        1917
      256700        1916
      256700        1917
Day1  210018           4
      210018           3
      256700         NaN
      256700           6
Day2  210018           7
      210018           9
      256700           8
      256700           9
dtype: float64

This is the source of the surprising error.

Upvotes: 3

Related Questions