brohjoe
brohjoe

Reputation: 934

Length of values does not match length of index when using pandas

I'm getting 'ValueError: Length of values does not match length of index' while using Pandas. I'm reading in data from an Excel spreadsheet using Pandas' 'pd.read_excel method. I then filter the data using Pandas' filter method. I've created 'dataSubset' to represent the filtered data. I use 'dataSubset' to create several 'mean' columns representing the mean of multiple columns respectively. I then create 'finalData' which represents the pd.concat function concatenating all of the calculated mean columns together. This code runs perfectly; however, if I uncomment any additional columns, the code blows up and gives the aformentioned error.

What am I doing wrong? It works as long as I don't concatenate more than it wants.

Help.

import pandas as pd

dataIn = pd.read_excel('IDT/IDT-B.xlsx')

dataSubset = dataIn.filter([
    "First Name",
    "Last Name",
    "2.14.2 Control Structures Example Quiz",
    "2.14.4 Random Hurdles",
    "2.15.2 Quiz: Which Control Structure?",
    "2.16.2 How to Indent Your Code Quiz",
    "2.16.4 Diagonal",
    "2.16.5 Staircase",
    "2.17.2 Debugging Basics",
    "2.17.6 Debugging with Error Messages",
    "3.2.6 Programming with Karel Quiz",
    "5.1.2 Hello World Quiz",
    "5.1.4 Your Name and Hobby",
    "5.2.2 Variables Quiz",
    "5.2.4 Daily Activities",
    "5.3.2 User Input Quiz",
    "5.3.4 Dinner Plans",
    "5.4.2 Basic Math in JavaScript Quiz",
    "5.4.6 T-Shirt Shop",
    "5.4.7 Running Speed",
    "5.5.2 JavaScript Graphics Quiz",
    "5.5.8 Flag of the Netherlands",
    "5.5.9 Snowman",
    "5.6.2 Using RGB to Create Colors",
    "5.6.4 Exploring RGB",
    "5.6.5 Making Yellow",
    "5.6.6 Rainbow",
    "5.6.7 Create a Color Image!",
    "6.1.1 Ghost",
    "6.1.2 Fried Egg",
    "6.1.3 Draw Something",
    "6.1.4 JavaScript and Graphics Quiz"

], axis=1)
# If any of these dataframes are uncommented, the code blows up.

# dataSubset['aver_2.14'] = dataSubset[["2.14.2 Control Structures Example Quiz",
#                                       "2.14.4 Random Hurdles"]],
# dataSubset['aver_2.15'] = dataSubset[["2.15.2 Quiz: Which Control Structure?"]],
# #
# # dataSubset['aver_2.16'] = dataSubset[["2.16.2 How to Indent Your Code Quiz",
# #                                       "2.16.4 Diagonal"]],
# #
# # dataSubset['aver_2.17'] = dataSubset[["2.17.2 Debugging Basics",
# #                                       "2.17.6 Debugging with Error Messages", ]]

dataSubset['unit_quiz_326'] = dataSubset[["3.2.6 Programming with Karel Quiz"]]

dataSubset['aver_5.1'] = dataSubset[["5.1.2 Hello World Quiz",
                                    "5.1.4 Your Name and Hobby"]].mean(axis=1)

dataSubset['aver_5.2'] = dataSubset[["5.2.2 Variables Quiz",
                                    "5.2.4 Daily Activities"]].mean(axis=1)

dataSubset['aver_5.3'] = dataSubset[["5.3.2 User Input Quiz",
                                    "5.3.4 Dinner Plans"]].mean(axis=1)

dataSubset['aver_5.4'] = dataSubset[["5.4.2 Basic Math in JavaScript Quiz",
                                    "5.4.6 T-Shirt Shop",
                                    "5.4.7 Running Speed"]].mean(axis=1)

dataSubset['aver_5.5'] = dataSubset[["5.5.2 JavaScript Graphics Quiz",
                                    "5.5.8 Flag of the Netherlands",
                                    "5.5.9 Snowman"]].mean(axis=1)

dataSubset['aver_5.6'] = dataSubset[["5.6.2 Using RGB to Create Colors",
                                    "5.6.4 Exploring RGB",
                                    "5.6.5 Making Yellow",
                                    "5.6.6 Rainbow",
                                    "5.6.7 Create a Color Image!"]].mean(axis=1)

dataSubset['aver_6.1'] = dataSubset[["6.1.1 Ghost",
                                    "6.1.2 Fried Egg",
                                    "6.1.3 Draw Something",
                                    "6.1.4 JavaScript and Graphics Quiz"]].mean(axis=1)

finalData = pd.concat([dataSubset['First Name'],
                       dataSubset['Last Name'],
                       dataSubset['unit_quiz_326'],
                       # dataSubset['aver_2.14'],
                       # dataSubset['aver_2.15'],
                       # dataSubset['aver_2.16'],
                       # dataSubset['aver_2.17'],
                       dataSubset['aver_5.1'],
                       dataSubset['aver_5.2'],
                       dataSubset['aver_5.3'],
                       dataSubset['aver_5.4'],
                       dataSubset['aver_5.5'],
                       dataSubset['aver_5.6'],
                       dataSubset['aver_6.1']], axis=1)

finalData.to_excel('output/gradesOut.xlsx')

Upvotes: 0

Views: 9243

Answers (1)

Anton Yang-Wälder
Anton Yang-Wälder

Reputation: 934

Cause of ValueError

Based on this line:

dataSubset['aver_2.15'] = dataSubset[["2.15.2 Quiz: Which Control Structure?"]],

The right side of the assignment has a trailing comma, the line is equivalent to this:

dataSubset['aver_2.15'] = (dataSubset[["2.15.2 Quiz: Which Control Structure?"]], )

Basically, the line is trying to perform the following assignment:

pandas.Series <-- Tuple[pandas.DataFrame]  # tuple with length 1

So there is a length mismatch between the left side (assignment target) and the right side (object that should be assigned to the target):

  • Left side: Length of the Series (think "number of rows")
  • Right side: One

Why is it pandas.Series on the left, but pandas.DataFrame on the right?

  • If you use single square brackets, you get a Series object: s = df['a']
  • If you use double square brackets, you get a DataFrame object: df2 = df[['a'']]

Possible solution

It seems you want to combine multiple columns into a new column. In one of the working lines, you take the mean of two columns with .mean(axis=1):

dataSubset['aver_5.1'] = dataSubset[["5.1.2 Hello World Quiz",
                                    "5.1.4 Your Name and Hobby"]].mean(axis=1)

So, to fix your code, you probably need to:

  • Remove trailing commas
  • Add mean() or some other "combining function" to the lines where you select multiple columns

Upvotes: 6

Related Questions