user3534918
user3534918

Reputation: 73

Python list comparison issues

I need to write a program in Python that compares two parallel lists to grade a multiple choice exam. One list has the exam solution and the second list has a student's answers. The question number for each missed question is to be stored in a third list using the natural index numbers. The solution must use indexing.

I keep getting an empty list returned for the third list. All help much appreciated!

def main():
    exam_solution =   ['B', 'D', 'A', 'A', 'C', 'A', 'B', 'A', 'C', 'D', 'B', 'C',\
               'D', 'A', 'D', 'C', 'C', 'B', 'D', 'A']
    student_answers = ['B', 'D', 'B', 'A', 'C', 'A', 'A', 'A', 'C', 'D', 'B', 'C',\
               'D', 'B', 'D', 'C', 'C', 'B', 'D', 'A']

    questions_missed = []  

    for item in exam_solution:
        if item not in student_answers:
            questions_missed.append(item)

Upvotes: 1

Views: 245

Answers (3)

Thorsten Kranz
Thorsten Kranz

Reputation: 12765

One more solution comes to my mind. I put in in a separate answers as it is "special"

Using numpy this task can be accomplished by:

import numpy as np
exam_solution = np.array(exam_solution)
student_answers = np.array(student_answers)

(exam_solution!=student_answers).nonzero()[0]

With numpy-arrays, elementwise comparison is possible via == and !=. .nonzero() returns the indices of the array elements that are not zero. That's it.

Timing is really interesting now. For your 19-elements lists, performances are (N=19,repetitions=100,000):

list comprehension: 0.904024521544
loop: 0.936516107421
numpy: 0.349371968612

This is already a factor of almost 3. Nice, but not amazing.

But when I increase the size of your lists by a factor of 100, I get (N=19*100=1900, repetitions=1000):

list comprehension: 0.866544042939
loop: 1.04464069977
numpy: 0.0334220694495

Now we have a factor of 26 or 31 - that is definitely a lot.

Probably, performance won't be your problem, but, nevertheless, I thought it's worth pointing out.

Upvotes: 0

Thorsten Kranz
Thorsten Kranz

Reputation: 12765

questions_missed = [i for i, (ex,st) in enumerate(zip(exam_solution, student_answers)) if ex != st]

or alternatively, if you prefer loops over list comprehensions:

questions_missed = []
for i, (ex,st) in enumerate(zip(exam_solution, student_answers)):
    if ex != st:
        questions_missed.append(i)

Both give [2,6,13]

Explanation:

enumerate is a utility function that returns an iterable object which yields tuples of indices and values, it can be used to, loosely speaking, "have the current index available during an iteration".

Zip creates a list of tuples, containing corresponding elements from two or more iterable objects (in your case lists).

I'd prefer the list comprehension version.

If I add some timing code, I see that performance doesn't really differ here:

def list_comprehension_version():
    questions_missed = [i for i, (ex,st) in enumerate(zip(exam_solution, student_answers)) if ex != st]
    return questions_missed

def loop_version():
    questions_missed = []

    for i, (ex,st) in enumerate(zip(exam_solution, student_answers)):
        if ex != st:
            questions_missed.append(i)

    return questions_missed

import timeit

print "list comprehension:", timeit.timeit("list_comprehension_version", "from __main__ import exam_solution, student_answers, list_comprehension_version", number=10000000)
print "loop:", timeit.timeit("loop_version", "from __main__ import exam_solution, student_answers, loop_version", number=10000000)

gives:

list comprehension: 0.895029446804
loop: 0.877159359719

Upvotes: 5

logc
logc

Reputation: 3923

A solution based on iterators

questions_missed = list(index for (index, _)
                        in filter(
                            lambda (_, (answer, solution)): answer != solution, 
                            enumerate(zip(student_answers, exam_solution))))

For the purists, note that you should import the equivalents of zip and filter (izip and ifilter) from itertools.

Upvotes: 0

Related Questions