user3783364
user3783364

Reputation: 1

Regression in pandas

I have two separate databases - a temperature db with hourly data and a house db with minute by minute data for hvac usage. I'm trying to plot the hvac data as a temperature series over a week, a month, and a year, but since the increments don't match the temperature db, I'm having trouble. I've tried making a least squares fit, but a) can't figure out how to do one in pandas and b) that gets really inaccurate after a day or two. Any suggestions?

Upvotes: 0

Views: 453

Answers (1)

CT Zhu
CT Zhu

Reputation: 54380

pandas timeseries is prefect for this application. You can merge series of different sample frequency and pandas will align them perfectly. Then you can downsample the data and preform regression, i.e., with statsmodels. An mock-up example:

In [288]:

idx1=pd.date_range('2001/01/01', periods=10, freq='D')
idx2=pd.date_range('2001/01/01', periods=500, freq='H')
df1 =pd.DataFrame(np.random.random(10), columns=['val1'])
df2 =pd.DataFrame(np.random.random(500), columns=['val2'])
df1.index=idx1
df2.index=idx2
In [291]:

df3=pd.merge(df1, df2, left_index=True, right_index=True, how='inner')
df4=df3.resample(rule='D')
In [292]:

print df4
                val1      val2
2001-01-01  0.399901  0.244800
2001-01-02  0.014448  0.423780
2001-01-03  0.811747  0.070047
2001-01-04  0.595556  0.679096
2001-01-05  0.218412  0.116764
2001-01-06  0.961310  0.040317
2001-01-07  0.058964  0.606843
2001-01-08  0.075129  0.407842
2001-01-09  0.833003  0.751287
2001-01-10  0.070072  0.559986

[10 rows x 2 columns]
In [294]:

import statsmodels.formula.api as smf
mod = smf.ols(formula='val1 ~ val2', data=df4)
res = mod.fit()
print res.summary()
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   val1   R-squared:                       0.061
Model:                            OLS   Adj. R-squared:                 -0.056
Method:                 Least Squares   F-statistic:                    0.5231
Date:                Fri, 27 Jun 2014   Prob (F-statistic):              0.490
Time:                        10:46:34   Log-Likelihood:                -3.3643
No. Observations:                  10   AIC:                             10.73
Df Residuals:                       8   BIC:                             11.33
Df Model:                           1                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept      0.5405      0.224      2.417      0.042         0.025     1.056
val2          -0.3502      0.484     -0.723      0.490        -1.467     0.766
==============================================================================
Omnibus:                        3.509   Durbin-Watson:                   2.927
Prob(Omnibus):                  0.173   Jarque-Bera (JB):                1.232
Skew:                           0.399   Prob(JB):                        0.540
Kurtosis:                       1.477   Cond. No.                         4.69
==============================================================================

Upvotes: 3

Related Questions