Reputation: 405
I am getting the following error:
All I am doing is running this piece of code. I have used this dataset https://www.kaggle.com/camnugent/sandp500 but tried a few different ones, and all of them returned this message.
pandas_df = pd.read_csv(file_path)
data_to_use = pandas_df.filter(['datetime', 'close'])
data_to_use.columns = ['ds', 'y']
data_to_use['ds'] = to_datetime(data_to_use['ds'])
model = Prophet()
model.fit(data_to_use)
future_dates = model.make_future_dataframe(periods=365)
prediction = model.predict(future_dates)
I have installed pystan and fbprophet in a new conda environment, python 3.7.10, pystan 2.19.1.1, fbprophet 0.7.1.
I am not sure how to proceed, so any advice would be appreciated.
Upvotes: 1
Views: 589
Reputation: 2252
I'm unable to reproduce your error, but I believe I know why the fit is failing.
The data you are trying to fit are closing stock prices for >500 stocks. When plotted using data_to_use.plot.scatter(x='ds', y='y', s=0.01)
, you get:
I believe the fit is failing because it is simply struggling to converge to a reasonable solution as mentioned in this pystan
discourse thread. When I naively try to fit this data, it doesn't fail but instead gives a nearly horizontal line prediction near y = 100
with very large error bars.
Depending on what you're trying to do, I'd suggest either picking one stock by Name
, for example:
data_to_use_pcln = pandas_df[pandas_df.Name.eq('PCLN')].filter(['date', 'close'])
data_to_use_pcln.columns = ['ds', 'y']
data_to_use_pcln['ds'] = pd.to_datetime(data_to_use_pcln['ds'])
or summing all stock prices together for each day like this:
summed_data = pandas_df.groupby('date').agg('sum').reset_index().filter(['date', 'close'])
summed_data.columns = ['ds', 'y']
summed_data['ds'] = pd.to_datetime(summed_data['ds'])
Passing the summed data to model.fit()
gives a reasonable solution:
Upvotes: 3