planaria
planaria

Reputation: 47

How to plot yearly dataset using pandas?

I want to plot line graph of yearly usage of {a, b} pairs.

X axis should be year, and y axis should be usage.

Here is my dataset.

a   b       year    usage   
a2  10104   2011    7.292787
a0  10104   2012    43.545533
b0  12011   2009    96.130359
b1  12011   2009    7.658487
b1  14102   2010    6.975572

Upvotes: 2

Views: 177

Answers (2)

jezrael
jezrael

Reputation: 862591

Use if no duplicates in pairs a, b vs year:

s = df.set_index(['year','a','b'])['usage']
      .unstack([1,2])
      .sort_‌​index(axis=1)
      .interpolate('index')
s.index = s.index.astype(str)
s.plot()

Upvotes: 1

piRSquared
piRSquared

Reputation: 294258

Use pivot_table to rearrange data such that I have combinations of a and b in the columns and year in the index.

However, there may be missing data once we've pivoted. By using interpolate with the index parameter, we fill in missing data with the interpolated values while maintaining proportionality with the relative changes in the index value. This matters when the index values are not uniformly separated, eg [2005, 2006, 2008]. If the index is uniformly separated, then using index will not harm as it would be the same as the default.

d1 = df.pivot_table('usage', 'year', ['a', 'b']).interpolate('index')
d1.index = pd.PeriodIndex(d1.index, freq='A')
d1.plot()

enter image description here


Response to @jezrael's comments

It is beneficial to observe what the data looks like without interpolate

d1 = df.pivot_table('usage', 'year', ['a', 'b'])
d1.index = pd.PeriodIndex(d1.index, freq='A')
d1

a            a0        a2         b0        b1          
b         10104     10104      12011     12011     14102
year                                                    
2009        NaN       NaN  96.130359  7.658487       NaN
2010        NaN       NaN        NaN       NaN  6.975572
2011        NaN  7.292787        NaN       NaN       NaN
2012  43.545533       NaN        NaN       NaN       NaN

In this case, we don't see what could happen. But I imagined that there was much more data where we'd see gaps of NaN between sparse data. Instead of forward filling, we can fill proportionately.

d1 = df.pivot_table('usage', 'year', ['a', 'b']).interpolate('index')
d1.index = pd.PeriodIndex(d1.index, freq='A')
d1

a            a0        a2         b0        b1          
b         10104     10104      12011     12011     14102
year                                                    
2009        NaN       NaN  96.130359  7.658487       NaN
2010        NaN       NaN  96.130359  7.658487  6.975572
2011        NaN  7.292787  96.130359  7.658487  6.975572
2012  43.545533  7.292787  96.130359  7.658487  6.975572

Now again, this would show its benefit only with more data. As with this data, we only get to observe limited forward filling.


Regarding mean
Note that I use pivot_table and one of the things pivot_table handles is if there are duplicates within the index/column specification. It will aggregate all values that fall into that cell with the aggfunc. By default that function is mean. However, in this case, per the data that has been presented, there are no duplicate a/b/year combinations. As such, no aggregation took place.

Upvotes: 3

Related Questions