dshrikant
dshrikant

Reputation: 617

Stock Prediction on basis of Symbol, Date, AveragePrice

I am trying to predict stock prices for next 7 days based on the data available for last 5 years. Data looks like this

Stock Symbol, date, Average Price

I am trying to apply Support vector regression on this data set. i have already converted date column to pandas datetime using data.Date = pd.to_datetime(data.Date), but still i get this error

float() argument must be a string or a number, not 'Timestamp'.

My code is as follows

from sklearn.svm import SVR
adaniPorts = data[data.Symbol == 'ADANIPORTS']

from sklearn.cross_validation import train_test_split
X = adaniPorts[['Symbol', 'Date']]
Y = adaniPorts['Average Price']
x_train, x_test, y_train, y_test = train_test_split(X, Y)

classifier = SVR().fit(x_train, y_train)

is there any way to resolve this problem of datetime?

Upvotes: 0

Views: 351

Answers (2)

yatu
yatu

Reputation: 88275

When you train the SVR you can only use numerical features. One way to include the datetime information would be to use pd.to_timedelta(df.date).dt.total_seconds()so you also feed the regressor with a numerical feature representing the date in this case. Another way would be to include the different fields of the datetime object, year, month, day as predictors.

However, using a SVR for time series forecasting would make more sense if the features provided enough information to overcome the temporal component, which dubiously is the case.

Furthermore you are using train_test_split, which will generate random train and test subsets from the original data. This cannot be applied directly with time series data as it assumes that there is no relationship between the observations. When dealing with time series the data must be split respecting the temporal order in which values were observed.

I suggest you also give a look at Recurrent neural networks or ARIMA models

Upvotes: 1

Tzomas
Tzomas

Reputation: 704

Like the answer from Alexandre said just numerical features are supported. If you used string feature it automaticaly transforms to numerical. You have several options. the first one is like he does, transforms each date to numerical seconds, but i think is better to transforme to one-hot encoding for each part of the date.

data['day'] = data.Date.dt.day
data['month'] = data.Date.dt.month
data['year'] = data.Date.dt.year

With this you have day, month and year separated. Now you can encode like one-hot. this is to build a vector of 0s for each element and then fill with one on the date you are working. For example the 3rd for a month will be:

[0,0,1,0,....,0] -> 1x31

To do that with pandas you can use something like this.

data = pd.concat([data, pd.get_dummies(data.year, prefix='year')], axis=1, sort=False)

data = pd.concat([data, pd.get_dummies(data.month, prefix='month')], axis=1, sort=False)

data = pd.concat([data, pd.get_dummies(data.day, prefix='day')], axis=1, sort=False)

Also can be interesting to add the week day because on weekends the world is stopper.

data['week_day'] = data.Date.dt.dayofweek

And before to pass to SVR drop the Date column. data.drop(['Date'], axis=1, inplace=True)

I hope this works

PS. I would recommend you LSTM (neural network) or Arima (estadistic model) for this task.

Upvotes: 0

Related Questions