Reputation: 21
I'm a beginner and I'm building a linear regression model using statsmodel.formula.api.OLS() function in python. I fit the model for the training data and used the predict() function on the y_test (my test data) to get my predicted values. I stored the predicted values in y_pred.
import statsmodels.formula.api as sm
result = sm.OLS(y_train, train_new).fit()
y_pred = result.predict(test_new)
When I tried printing out y_pred it came out in a numpy array form, while y_test is in pandas dataframe format.
In[44]: type(y_pred)
Out[44]: numpy.ndarray
In[45]:type(y_test)
Out[45]: pandas.core.series.Series
I want to create a new Pandas dataframe which has y_test as one column and y_pred as another column, adjacent to each other, and store it into a csv file, so that it would be easier to compare them side by side in adjacent columns. But, when I try
pd.DataFrame(y_pred, y_test, columns=['predictions', 'actual']).to_csv('prediction.csv')
I get
ValueError: Shape of passed values is (1, 5039), indices imply (2, 5039)
When I tried converting y_pred array into a dataframe and then concatenating it to y_test dataframe using
pd.concat([df1, df2], axis=1)
I get a blank column of empty cells of y_test adjacent to values of y_pred. Whatever I try I'm simply unable to produce a dataframe/csv file with two adjacent columns of actual and predicted values. What do you people suggest?
Upvotes: 2
Views: 4746
Reputation:
pd.DataFrame(y_pred, y_test, columns=['predictions', 'actual']).to_csv('prediction.csv')
If you take a look at the parameters of pd.DataFrame you will see that the above line (df construction part of it) is actually:
pd.DataFrame(data=y_pred, index=y_test, columns=['predictions', 'actual'])
It is because y_test
is the second parameter and you need to somehow tell pandas that it should be included in data. Otherwise, you are passing only one list as data and you have two column names - so you have an error. An easy way to pass two lists as two columns is to use a dictionary:
pd.DataFrame(data={'predictions': y_pred, 'actual': y_test})
Upvotes: 2