Reputation: 2075
I have a dataframe object which contains 1 seconds intervals of the EUR_USD currency pair. But in theory it could be any interval and in this case it could look like this:
2015-11-10 01:00:00+01:00 1.07616
2015-11-10 01:01:00+01:00 1.07605
2015-11-10 01:02:00+01:00 1.07590
2015-11-10 01:03:00+01:00 1.07592
2015-11-10 01:04:00+01:00 1.07583
I'd like to use linear regression to draw a trend line from the data in dataframe, but I'm not sure what the best way are to do that with time series, and even such a small interval of time series.
So far I've messed around by replacing the time by (and this is just to show where I'd like to go with it) a list ranging from 0 to the time series list length.
x = list(range(0, len(df.index.tolist()), 1))
y = df["closeAsk"].tolist()
Using numpy to do the math magic
fit = np.polyfit(x,y,1)
fit_fn = np.poly1d(fit)
Lastly I draw the function along with the df["closeAsk"] to make sense of the trend.
plt.plot(x,df["closeAsk"], '-')
plt.plot(x,y, 'yo', x, fit_fn(x), '--k')
plt.show()
However now the x-axis is just meaningless numbers, instead I'd like for them to show the time series.
Upvotes: 11
Views: 16843
Reputation: 10379
Building on the accepted answer, here's a neat way to plot both trend and data from any pd.Series, including time series:
trend(df['data']).plot()
Where trend.plot
is defined as follows (generalized from the accepted answer):
def trend(s):
x = np.arange(len(s))
z = np.polyfit(x, s, 1)
p = np.poly1d(z)
t = pd.Series(p(x), index=s.index)
return t
trend.plot = lambda s: [s.plot(), trend(s).plot()]
If you need just the trend data (not the plot):
trendline = trend(df['data'])
Upvotes: 0
Reputation: 1
you can create a numpy linspace for the x-values in the same length as your datapoint like so:
y = df["closeAsk"].dropna() # or.fillna(method='bfill')
x = np.linspace(1, len(y), num=len(y))
import seaborn as sb
sb.regplot(x, y)
Upvotes: 0
Reputation: 5364
To elaborate on my comment:
Say you have some evenly spaced time series data, time
, and some correlated data, data
, as you've laid out in your question.
time = pd.date_range('9:00', '10:00', freq='1s')
data = np.cumsum(np.random.randn(time.size))
df = pd.DataFrame({'time' : time,
'data' : data})
As you've shown, you can do a linear fit of the data with np.polyfit
and create the trend line with np.poly1d
.
x = np.arange(time.size) # = array([0, 1, 2, ..., 3598, 3599, 3600])
fit = np.polyfit(x, df['data'], 1)
fit_fn = np.poly1d(fit)
Then plot the data and the fit with df['time']
as the x-axis.
plt.plot(df['time'], fit_fn(x), 'k-')
plt.plot(df['time'], df['data'], 'go', ms=2)
Upvotes: 17