vandelay
vandelay

Reputation: 2075

Linear regression with pandas time series

I have a dataframe object which contains 1 seconds intervals of the EUR_USD currency pair. But in theory it could be any interval and in this case it could look like this:

2015-11-10 01:00:00+01:00    1.07616
2015-11-10 01:01:00+01:00    1.07605
2015-11-10 01:02:00+01:00    1.07590
2015-11-10 01:03:00+01:00    1.07592
2015-11-10 01:04:00+01:00    1.07583

I'd like to use linear regression to draw a trend line from the data in dataframe, but I'm not sure what the best way are to do that with time series, and even such a small interval of time series.

So far I've messed around by replacing the time by (and this is just to show where I'd like to go with it) a list ranging from 0 to the time series list length.

x = list(range(0, len(df.index.tolist()), 1))
y = df["closeAsk"].tolist()

Using numpy to do the math magic

fit = np.polyfit(x,y,1)
fit_fn = np.poly1d(fit)

Lastly I draw the function along with the df["closeAsk"] to make sense of the trend.

plt.plot(x,df["closeAsk"], '-')
plt.plot(x,y, 'yo', x, fit_fn(x), '--k')
plt.show()

However now the x-axis is just meaningless numbers, instead I'd like for them to show the time series.

Upvotes: 11

Views: 16843

Answers (4)

miraculixx
miraculixx

Reputation: 10379

Building on the accepted answer, here's a neat way to plot both trend and data from any pd.Series, including time series:

trend(df['data']).plot()

Where trend.plot is defined as follows (generalized from the accepted answer):

def trend(s):
    x = np.arange(len(s))
    z = np.polyfit(x, s, 1)
    p = np.poly1d(z)
    t = pd.Series(p(x), index=s.index)
    return t

trend.plot = lambda s: [s.plot(), trend(s).plot()]

If you need just the trend data (not the plot):

trendline = trend(df['data'])

Upvotes: 0

Björn
Björn

Reputation: 1

you can create a numpy linspace for the x-values in the same length as your datapoint like so:

y = df["closeAsk"].dropna() # or.fillna(method='bfill')
x = np.linspace(1, len(y), num=len(y))

import seaborn as sb

sb.regplot(x, y)

Upvotes: 0

knagaev
knagaev

Reputation: 2967

May be you wil be happy with seaborn? Please try seaborn.regplot

Plot the relationship between two variables in a DataFrame

Upvotes: 0

lanery
lanery

Reputation: 5364

To elaborate on my comment:

Say you have some evenly spaced time series data, time, and some correlated data, data, as you've laid out in your question.

time = pd.date_range('9:00', '10:00', freq='1s')
data = np.cumsum(np.random.randn(time.size))

df = pd.DataFrame({'time' : time,
                   'data' : data})

As you've shown, you can do a linear fit of the data with np.polyfit and create the trend line with np.poly1d.

x = np.arange(time.size) # = array([0, 1, 2, ..., 3598, 3599, 3600])
fit = np.polyfit(x, df['data'], 1)
fit_fn = np.poly1d(fit)

Then plot the data and the fit with df['time'] as the x-axis.

plt.plot(df['time'], fit_fn(x), 'k-')
plt.plot(df['time'], df['data'], 'go', ms=2)

enter image description here

Upvotes: 17

Related Questions