Reputation: 934
I'm getting 'ValueError: Length of values does not match length of index' while using Pandas. I'm reading in data from an Excel spreadsheet using Pandas' 'pd.read_excel method. I then filter the data using Pandas' filter method. I've created 'dataSubset' to represent the filtered data. I use 'dataSubset' to create several 'mean' columns representing the mean of multiple columns respectively. I then create 'finalData' which represents the pd.concat function concatenating all of the calculated mean columns together. This code runs perfectly; however, if I uncomment any additional columns, the code blows up and gives the aformentioned error.
What am I doing wrong? It works as long as I don't concatenate more than it wants.
Help.
import pandas as pd
dataIn = pd.read_excel('IDT/IDT-B.xlsx')
dataSubset = dataIn.filter([
"First Name",
"Last Name",
"2.14.2 Control Structures Example Quiz",
"2.14.4 Random Hurdles",
"2.15.2 Quiz: Which Control Structure?",
"2.16.2 How to Indent Your Code Quiz",
"2.16.4 Diagonal",
"2.16.5 Staircase",
"2.17.2 Debugging Basics",
"2.17.6 Debugging with Error Messages",
"3.2.6 Programming with Karel Quiz",
"5.1.2 Hello World Quiz",
"5.1.4 Your Name and Hobby",
"5.2.2 Variables Quiz",
"5.2.4 Daily Activities",
"5.3.2 User Input Quiz",
"5.3.4 Dinner Plans",
"5.4.2 Basic Math in JavaScript Quiz",
"5.4.6 T-Shirt Shop",
"5.4.7 Running Speed",
"5.5.2 JavaScript Graphics Quiz",
"5.5.8 Flag of the Netherlands",
"5.5.9 Snowman",
"5.6.2 Using RGB to Create Colors",
"5.6.4 Exploring RGB",
"5.6.5 Making Yellow",
"5.6.6 Rainbow",
"5.6.7 Create a Color Image!",
"6.1.1 Ghost",
"6.1.2 Fried Egg",
"6.1.3 Draw Something",
"6.1.4 JavaScript and Graphics Quiz"
], axis=1)
# If any of these dataframes are uncommented, the code blows up.
# dataSubset['aver_2.14'] = dataSubset[["2.14.2 Control Structures Example Quiz",
# "2.14.4 Random Hurdles"]],
# dataSubset['aver_2.15'] = dataSubset[["2.15.2 Quiz: Which Control Structure?"]],
# #
# # dataSubset['aver_2.16'] = dataSubset[["2.16.2 How to Indent Your Code Quiz",
# # "2.16.4 Diagonal"]],
# #
# # dataSubset['aver_2.17'] = dataSubset[["2.17.2 Debugging Basics",
# # "2.17.6 Debugging with Error Messages", ]]
dataSubset['unit_quiz_326'] = dataSubset[["3.2.6 Programming with Karel Quiz"]]
dataSubset['aver_5.1'] = dataSubset[["5.1.2 Hello World Quiz",
"5.1.4 Your Name and Hobby"]].mean(axis=1)
dataSubset['aver_5.2'] = dataSubset[["5.2.2 Variables Quiz",
"5.2.4 Daily Activities"]].mean(axis=1)
dataSubset['aver_5.3'] = dataSubset[["5.3.2 User Input Quiz",
"5.3.4 Dinner Plans"]].mean(axis=1)
dataSubset['aver_5.4'] = dataSubset[["5.4.2 Basic Math in JavaScript Quiz",
"5.4.6 T-Shirt Shop",
"5.4.7 Running Speed"]].mean(axis=1)
dataSubset['aver_5.5'] = dataSubset[["5.5.2 JavaScript Graphics Quiz",
"5.5.8 Flag of the Netherlands",
"5.5.9 Snowman"]].mean(axis=1)
dataSubset['aver_5.6'] = dataSubset[["5.6.2 Using RGB to Create Colors",
"5.6.4 Exploring RGB",
"5.6.5 Making Yellow",
"5.6.6 Rainbow",
"5.6.7 Create a Color Image!"]].mean(axis=1)
dataSubset['aver_6.1'] = dataSubset[["6.1.1 Ghost",
"6.1.2 Fried Egg",
"6.1.3 Draw Something",
"6.1.4 JavaScript and Graphics Quiz"]].mean(axis=1)
finalData = pd.concat([dataSubset['First Name'],
dataSubset['Last Name'],
dataSubset['unit_quiz_326'],
# dataSubset['aver_2.14'],
# dataSubset['aver_2.15'],
# dataSubset['aver_2.16'],
# dataSubset['aver_2.17'],
dataSubset['aver_5.1'],
dataSubset['aver_5.2'],
dataSubset['aver_5.3'],
dataSubset['aver_5.4'],
dataSubset['aver_5.5'],
dataSubset['aver_5.6'],
dataSubset['aver_6.1']], axis=1)
finalData.to_excel('output/gradesOut.xlsx')
Upvotes: 0
Views: 9243
Reputation: 934
ValueError
Based on this line:
dataSubset['aver_2.15'] = dataSubset[["2.15.2 Quiz: Which Control Structure?"]],
The right side of the assignment has a trailing comma, the line is equivalent to this:
dataSubset['aver_2.15'] = (dataSubset[["2.15.2 Quiz: Which Control Structure?"]], )
Basically, the line is trying to perform the following assignment:
pandas.Series <-- Tuple[pandas.DataFrame] # tuple with length 1
So there is a length mismatch between the left side (assignment target) and the right side (object that should be assigned to the target):
Series
(think "number of rows")Why is it pandas.Series
on the left, but pandas.DataFrame
on the right?
Series
object: s = df['a']
DataFrame
object: df2 = df[['a'']]
It seems you want to combine multiple columns into a new column. In one of the working lines, you take the mean of two columns with .mean(axis=1)
:
dataSubset['aver_5.1'] = dataSubset[["5.1.2 Hello World Quiz",
"5.1.4 Your Name and Hobby"]].mean(axis=1)
So, to fix your code, you probably need to:
mean()
or some other "combining function" to the lines where you select multiple columnsUpvotes: 6