Reputation: 16935
I have time series data which are multi-indexed on (Year, Month) as seen here:
print(df.index)
print(df)
MultiIndex(levels=[[2016, 2017], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
labels=[[0, 0, 0, 0, 0, 0, 0, 0], [2, 3, 4, 5, 6, 7, 8, 9]],
names=['Year', 'Month'])
Value
Year Month
2016 3 65.018150
4 63.130035
5 71.071254
6 72.127967
7 67.357795
8 66.639228
9 64.815232
10 68.387698
I want to do very basic linear regression on these time series data. Because pandas.DataFrame.plot
does not do any regression, I intend to use Seaborn to do my plotting.
I attempted to do this by using lmplot
:
sns.lmplot(x=("Year", "Month"), y="Value", data=df, fit_reg=True)
but I get an error:
TypeError: '>' not supported between instances of 'str' and 'tuple'
This is particularly interesting to me because all elements in df.index.levels[:]
are of type numpy.int64
, all elements in df.index.labels[:]
are of type numpy.int8
.
Why am I receiving this error? How can I resolve it?
Upvotes: 4
Views: 5185
Reputation: 339150
You can use reset_index
to turn the dataframe's index into columns. Plotting DataFrames columns is then straight forward with seaborn.
As I guess the reason to use lmplot
would be to show different regressions for different years (otherwise a regplot
may be better suited), the "Year"
column can be used as hue
.
import numpy as np
import pandas as pd
import seaborn.apionly as sns
import matplotlib.pyplot as plt
iterables = [[2016, 2017], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]
index = pd.MultiIndex.from_product(iterables, names=['Year', 'Month'])
df = pd.DataFrame({"values":np.random.rand(24)}, index=index)
df2 = df.reset_index() # or, df.reset_index(inplace=True) if df is not required otherwise
g = sns.lmplot(x="Month", y="values", data=df2, hue="Year")
plt.show()
Upvotes: 10
Reputation: 210832
Consider the following approach:
df['x'] = df.index.get_level_values(0) + df.index.get_level_values(1)/100
yields:
In [49]: df
Out[49]:
Value x
Year Month
2016 3 65.018150 2016.03
4 63.130035 2016.04
5 71.071254 2016.05
6 72.127967 2016.06
7 67.357795 2016.07
8 66.639228 2016.08
9 64.815232 2016.09
10 68.387698 2016.10
let's prepare X-ticks labels:
labels = df.index.get_level_values(0).astype(str) + '-' + \
df.index.get_level_values(1).astype(str).str.zfill(2)
sns.lmplot(x='x', y='Value', data=df, fit_reg=True)
ax = plt.gca()
ax.set_xticklabels(labels)
Result:
Upvotes: 4