John Iskra
John Iskra

Reputation: 165

I don't understand why the for loop is not working

I'm running python 3.7.3 on ubuntu 19.04 using visual studio code 1.39.1. I'm a bit of a noob so please be patient with me.

I'm working my way through computation and programming using Python. The authors give this example which I typed into VSC, I'm 99% sure, accurately, but it's not doing what the program is supposed to, namely finding the intersection of two lists:

def intersect(L1,L2):
    tmp = []
    for e1 in L1:
        for e2 in L2:
            if e1 == e2:
                tmp.append(e1)
                break
            result = []
            for e in tmp:
                if e not in result:
                    result.append(e)
            return result

S = [2,3]
T = [2,3,4]
intersect(S,T)

The output is [2] rather than [2,3], so it seems like the program is not running through S, but I can't figure out why.

Thank you in advance.

Upvotes: 3

Views: 175

Answers (2)

RoyM
RoyM

Reputation: 746

Now that Adam has given you a working solution, there's also something to be said about the efficiency of your algorithm. Be aware that this will go beyond the scope of your question.

Notice that for every element in your first list, there's a check against every element in your second list. That means if both lists have n elements, you will do at least n^2 comparisons. Imagine that n gets large, say 100000, and the range of sizes in which the elements may appear be sufficiently large, will your algorithm be able to handle this? Generally you want to try to avoid nested for loops, if possible. And in this case it is:

Let's start by importing a tool to give us random numbers because I'm way too lazy to generate 100000 numbers by hand:

from random import randrange as rr

Now let's set some high parameters to test the algorithm:

list_length = 10000
random_space = 200000

Let's create two lists using these parameters:

l1 = [rr(0,random_space,1) for i in range(list_length)]
l2 = [rr(0,random_space,1) for i in range(list_length)]

Now here's your correctly indented algorithm, with one variable checks added with tells us roughly how many times we compare two elements:

def intersect(L1,L2):
    tmp = []
    checks = 0
    for e1 in L1:
        for e2 in L2:
            checks += 1
            if e1 == e2:
                tmp.append(e1)
                break
    result = []
    for e in tmp:
        if e not in result:
            result.append(e)
    return result, checks

Let's print out our result:

res, checks = intersect(l1, l2)
print(res)
print("Number of comparisons: "+str(checks))

Now run the code. I think you'll find that everything works, but it takes a bit of time and the number of comparisons is probably close to 100 million! Try changing the list_length from 10000 to 100000, will the code even finish when you run it now?

Now let's make a few changes. We delete the last three lines. Then we introduce a modified version of the intersect-function:

def intersect_v2(L1, L2):
    dict = {}
    checks = 0
    for e1 in L1:
        checks += 1
        dict[e1] = 1
    for e2 in L2:
        checks += 1
        if e2 in dict:
            dict[e2] += 1
    return [e for e in dict if dict[e] > 1], checks

Notice that we only run through each lists once here. How big of a difference does that make?

Let's test it. Change list_length back to 10000. Then add the last three lines back in, using the modified intersect-function:

res, checks = intersect_v2(l1, l2)
print(res)
print("Number of comparisons: "+str(checks))

Run the code and see. Then try changing the list_length back to 100000 and run it again. Does it complete in a reasonable time now?

There's probably even more efficient ways to do this. But getting rid of the nested for-loop is often a big deal.

Upvotes: 1

Adam
Adam

Reputation: 17389

Python uses indentation to mark the beginning and end a block of code such as a function or loop body:

for condition:
    statement_in_loop()
    statement_in_loop()

statement_outside_of_loop()

If you were to indent that last line it would be inside the loop. Some languages use braces to mark the body of a loop, Python doesn't.

Be especially careful when copy/pasting code. Often editors will try to helpfully fix your indentation for you and end up changing the logic.

Your code appears to over-indent the final loop, which makes it run too many times.

Try this:

def intersect(L1,L2):
    tmp = []
    for e1 in L1:
        for e2 in L2:
            if e1 == e2:
                tmp.append(e1)
                break
    result = []
    for e in tmp:
        if e not in result:
            result.append(e)
    return result

S = [2,3]
T = [2,3,4]
intersect(S,T)

Upvotes: 3

Related Questions