Tranquil Oshan
Tranquil Oshan

Reputation: 47

Scatter plot throws TypeError

I am trying following code:

from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model
model = linear_model.LogisticRegression()
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, r2_score

X=scaler.fit_transform(X)

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2)

model.fit(X_train,y_train)
# Make predictions using the testing set
powerOutput_y_pred = model.predict(X_test)
print (powerOutput_y_pred)
# The coefficients
print('Coefficients: \n', model.coef_)
# The mean squared error
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, powerOutput_y_pred))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % r2_score(y_test, powerOutput_y_pred))

plt.scatter(X_test, y_test,  color='black')
plt.plot(X_test, powerOutput_y_pred, color='blue', linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()

But i am getting the following error for the scatter plot:

ValueError: x and y must be the same size

If i run df.head(), i get following table:

df structure

The features X and y are as below:

X=df.values[:,[0,1,2,3,4,5,7]]
y=df.values[:,6]

Running X.shape gives (25,7) and y.shape gives (25, ) as output. So how to fix this shape mismatch?

Upvotes: 2

Views: 1679

Answers (1)

tel
tel

Reputation: 13999

Simplest answer

Just use plot instead of scatter:

plt.plot(X_test, y_test, ls="none", marker='.', ms=12)

This will plot the different sets of x data all using the same single set of y data. This assumes that x.shape == (n,d) and y.shape == (n,), as in your question above.

Simple answer

Loop over the columns of your x values, and call scatter once for each column:

colors = plt.cm.viridis(np.linspace(0.0, 1.0, features))
for xcol,c in zip(X_test.T, colors):
    plt.scatter(xcol, y_test, c=c)

Setting c with the array colors will make it so that each feature is plotted as a different color on the scatter plot. If do you want them all to be black, just replace the colors stuff above with c='black'

details

scatter expects one list of x values and one list of y values. It's simplest if the x and y list are 1D. However you can also plot multiple sets of x and y data stored in 2D arrays, if those arrays have matching shape.

From the Matplotlib docs:

Fundamentally, scatter works with 1-D arrays; x, y, s, and c may be input as 2-D arrays, but within scatter they will be flattened.

A bit vague, but a dive into the Matplotlib source code confirms that the shapes of x and y have to match exactly. The code that handles shapes for plot is more flexible, so for that function you can away get with using one set of y data for many sets of x data.

Normally plot plots lines instead of dots, but you can turn lines off by setting ls (ie linestyle), and you can turn dots on by setting marker. ms (ie markersize) controls the size of the dots.

example

The example you posted above won't run (X and y aren't defined), but here's a complete example with output:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm

from sklearn import datasets
from sklearn.model_selection import train_test_split

d = datasets.load_diabetes()
features = d.data.shape[1]

X = d.data[:50,:]
Y = d.target[:50]

sample_weight = np.random.RandomState(442).rand(Y.shape[0])

# split train, test for calibration
X_train, X_test, Y_train, Y_test, sw_train, sw_test = \
    train_test_split(X, Y, sample_weight, test_size=0.9, random_state=442)

# use the plot function instead of scatter
# plot one set of y data against several sets of x data
plt.plot(X_test, Y_test, ls="none", marker='.', ms=12)

# call .scatter() multiple times in a loop
#colors = plt.cm.viridis(np.linspace(0.0, 1.0, features))
#for xcol,c in zip(X_test.T, colors):
#    plt.scatter(xcol, Y_test, c=c)

output:

enter image description here

Upvotes: 2

Related Questions