rigved
rigved

Reputation: 115

scipy.interpolate.make_interp_spline gives “x and y are incompatible” error

I'm trying to create a smooth frequency distribution graph. The code is working for a certain dataset but gives the following error message for another dataset:

spl1 = make_interp_spline(bins1, data1['Frequency'].values)

File "/<path_to_anaconda3>/envs/mlpy37/lib/python3.7/site-packages/scipy/interpolate/_bsplines.py", line 805, in make_interp_spline
    raise ValueError('x and y are incompatible.')
ValueError: x and y are incompatible.

Here is the code with the dataset that works fine:

import math
import numpy as np
import pandas as pd
import statistics
from scipy.stats import skew
from matplotlib import pyplot as plt
from scipy.interpolate import make_interp_spline

raw_data1 = [212, 869, 220, 654, 11, 624, 420, 121, 428, 865, 799, 405, 230, 670, 870, 366, 99, 55, 489, 312, 493, 163, 221, 84, 144, 48, 375, 86, 168, 100]
min_value1 = min(raw_data1)
max_value1 = max(raw_data1)
step1 = math.ceil((max_value1 - min_value1) / 10)
bin_edges1 = [i for i in range(min_value1 - 1, max_value1 + 1, step1)]
bins1 = [i for i in range(min_value1, max_value1 + 1, step1)]
if max(bin_edges1) < max_value1:
    bin_edges1.append(max(bin_edges1) + step1)
    bins1.append(max(bins1) + step1)
data1 = pd.DataFrame({'Frequency': pd.cut(raw_data1, bin_edges1).value_counts()})
x1 = np.linspace(min(bins1), max(bins1), 250)
spl1 = make_interp_spline(bins1, data1['Frequency'].values)
smooth_curve1 = spl1(x1)

print(data1)
mean1 = statistics.mean(raw_data1)
median1 = statistics.median(raw_data1)
print('Mean: {:.2f}'.format(mean1))
print('Median: {:.2f}'.format(median1))
try:
    print('Mode: {:.2f}'.format(statistics.mode(raw_data1)))
except Exception as e:
    print(e)
skewness1 = skew(raw_data1)
if mean1 > median1:
    print('Positive Skewness: ' + str(skewness1))
elif mean1 < median1:
    print('Negative Skewness: ' + str(skewness1))
else:
    print('No skewness: ' + str(skewness1))

plt.figure()

plt.subplot(111)
plt.plot(x1, smooth_curve1)
plt.title('Numerical Variables Exercise Skewness')
plt.xlabel('Data')
plt.ylabel('Frequency')

plt.show()

If I substitute the above code with the following dataset, it does not work:

raw_data1 = [586, 760, 495, 678, 559, 415, 370, 659, 119, 288, 241, 787, 522, 207, 160, 526, 656, 848, 720, 676, 581, 929, 653, 661, 770, 800, 529, 975, 995, 947]

And the full error message that I get is this one:

Traceback (most recent call last):
  File "/<path_to_file>/NumericalVariablesExercise_Skewness.py", line 20, in <module>
    spl1 = make_interp_spline(bins1, data1['Frequency'].values)
  File "/<path_to_anaconda3>/envs/mlpy37/lib/python3.7/site-packages/scipy/interpolate/_bsplines.py", line 805, in make_interp_spline
    raise ValueError('x and y are incompatible.')
ValueError: x and y are incompatible.

Could someone please assist in identifying the error in my code or logic?

Upvotes: 1

Views: 2246

Answers (1)

P&#233;ter Le&#233;h
P&#233;ter Le&#233;h

Reputation: 2119

Commenting out a line actually solves the problem (or at least it runs, I can't validate the output). The error message is useful: the x and y should be the same length.

if max(bin_edges1) < max_value1:
    bin_edges1.append(max(bin_edges1) + step1)
    # bins1.append(max(bins1) + step1) <-- this one

Additionally, your code is hard to follow, because you mix up your tools. You define the raw_data1 as python lists and bins1 too, with a list comprehension.

raw_data1 = [212, 869, 220, 654, 11, 624, 420, 121, 428, 865, 799, 405, 230, 670, 870, 366, 99, 55, 489, 312, 493, 163, 221, 84, 144, 48, 375, 86, 168, 100]
..
bins1 = [i for i in range(min_value1, max_value1 + 1, step1)]

Then you use numpy.linspace for x1.

x1 = np.linspace(min(bins1), max(bins1), 250)

and also involve pandas:

data1 = pd.DataFrame({'Frequency': pd.cut(raw_data1, bin_edges1).value_counts()})

I'd recommend staying with one mainly, and only use other tools when it's necessary.

Upvotes: 3

Related Questions