user7939273
user7939273

Reputation:

Python - Plotting and linear regression - x and y must be the same size

I'm teaching myself some more tricks with python and scikit, and I'm trying to plot a linear regression model. My code can be seen below. But my program and console give the following error: x and y must be the same size. Additionally, my program makes it to the end of my code, but nothing gets plotted.

To fix the size error, the first thing that came to mind was testing the length of x and y with something like len(x) == len(y). But as far as I can tell, my data seems to be the same length. Maybe the error is referring to something other than length (if so, I'm not sure what). Would really appreciate any help.

enter image description here

from sklearn import cross_validation
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn import linear_model
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Create linear regression object
regr = linear_model.LinearRegression()

#load csv file with pandas
df = pd.read_csv("pokemon.csv")
#remove all string columns
df = df.drop(['Name','Type_1','Type_2','isLegendary','Color','Pr_Male','hasGender','Egg_Group_1','Egg_Group_2','hasMegaEvolution','Body_Style'], axis=1)

y= df.Catch_Rate

x_train, x_test, y_train, y_test = cross_validation.train_test_split(df, y, test_size=0.25, random_state=0)

# Train the model using the training sets
regr.fit(x_train, y_train)

# Make predictions using the testing set
pokemon_y_pred = regr.predict(x_test)

print (pokemon_y_pred)

# Plot outputs
plt.title("Linear Regression Model of Catch Rate")
plt.scatter(x_test, y_test,  color='black')
plt.plot(x_test, pokemon_y_pred, color='blue', linewidth=3)

plt.xticks(())
plt.yticks(())

plt.show()

Upvotes: 1

Views: 6473

Answers (2)

Manjot singh
Manjot singh

Reputation: 1

This error generates only when you have more different values of x for one y actually there are comparatively more columns in x_test than y_test.Thats why there is a size problem. There should not be different x for one y:-basic mathematics fundamental.

Upvotes: 0

xyzjayne
xyzjayne

Reputation: 1387

This is referring to the fact that your x-variable has more than one dimension; plot and scatter only work for 2D plots, and it seems that your x_test has multiple features while y_test and pokemon_y_pred are one-dimensional.

Upvotes: 3

Related Questions