Reputation: 43
I am trying to calculate Pearson Coefficient Correlation using my data loaded from csv, when i run my code - i get the mismatch error, not sure how to get rid of it.
This is Python 3. I have tried storing the values in a new variable
def correl(filename, header_one, header_two):
df =pd.read_csv(filename)
df.columns = [header_one, header_two]
x= np.asarray(df[header_one])
y= np.asarray(df[header_two])
n_times_sum_of_x_times_y = len(x) * np.sum(np.multiply(x,y))
sum_of_x = np.sum(x)
sum_of_y = np.sum(y)
n_times_sum_of_x_squared = len(x) * np.sum(np.multiply(x, x))
n_times_sum_of_y_squared = len(x) * np.sum(np.multiply(y, y))
sum_of_x_squared = sum_of_x ** 2
sum_of_y_squared = sum_of_y ** 2
numerator = n_times_sum_of_x_times_y - sum_of_x * sum_of_y
denominator_squared = (n_times_sum_of_x_squared - sum_of_x_squared) * (n_times_sum_of_y_squared - sum_of_y_squared)
denominator = np.sqrt(denominator_squared)
p_correl_coefficient = (numerator / denominator)
return(p_correl_coefficient)
#print(correl("imdb.csv", 'gross', 'budget'))
Actual error i get when i run my code:
File "C:\Users\ruthi\Anaconda3\lib\site-packages\pandas\core\generic.py", line 638, in _set_axis self._data.set_axis(axis, labels)
File "C:\Users\ruthi\Anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 155, in set_axis 'values have {new} elements'.format(old=old_len, new=new_len)) ValueError: Length mismatch: Expected axis has 23 elements, new values have 2 elements
Upvotes: 2
Views: 2674
Reputation: 3495
It means that in the file you have 23 columns, but you try to rename them with just 2 columns. If header_one
and header_two
are names of existing columns, do df = df[[header_one, header_two]]
, if not than first pick the two columns you are interested in, and then rename them with df.columns = [header_one, header_two]
Upvotes: 1