Ruth Ineh
Ruth Ineh

Reputation: 43

ValueError: Length mismatch: Expected axis has 23 elements, new values have 2 elements. Pandas length mismatch

I am trying to calculate Pearson Coefficient Correlation using my data loaded from csv, when i run my code - i get the mismatch error, not sure how to get rid of it.

This is Python 3. I have tried storing the values in a new variable

def correl(filename, header_one, header_two):

    df =pd.read_csv(filename)

    df.columns = [header_one, header_two]

    x= np.asarray(df[header_one])
    y= np.asarray(df[header_two])

    n_times_sum_of_x_times_y = len(x) * np.sum(np.multiply(x,y))

    sum_of_x = np.sum(x)
    sum_of_y = np.sum(y)


    n_times_sum_of_x_squared = len(x) * np.sum(np.multiply(x, x))
    n_times_sum_of_y_squared = len(x) * np.sum(np.multiply(y, y))

    sum_of_x_squared = sum_of_x ** 2
    sum_of_y_squared = sum_of_y ** 2

    numerator = n_times_sum_of_x_times_y - sum_of_x * sum_of_y
    denominator_squared = (n_times_sum_of_x_squared - sum_of_x_squared) * (n_times_sum_of_y_squared - sum_of_y_squared)
    denominator = np.sqrt(denominator_squared)

    p_correl_coefficient = (numerator / denominator)

    return(p_correl_coefficient)

#print(correl("imdb.csv", 'gross', 'budget'))

Actual error i get when i run my code:

File "C:\Users\ruthi\Anaconda3\lib\site-packages\pandas\core\generic.py", line 638, in _set_axis self._data.set_axis(axis, labels)

File "C:\Users\ruthi\Anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 155, in set_axis 'values have {new} elements'.format(old=old_len, new=new_len)) ValueError: Length mismatch: Expected axis has 23 elements, new values have 2 elements

Upvotes: 2

Views: 2674

Answers (1)

Aryerez
Aryerez

Reputation: 3495

It means that in the file you have 23 columns, but you try to rename them with just 2 columns. If header_one and header_two are names of existing columns, do df = df[[header_one, header_two]], if not than first pick the two columns you are interested in, and then rename them with df.columns = [header_one, header_two]

Upvotes: 1

Related Questions