chrisfs
chrisfs

Reputation: 6333

Python: Not finding value in list that should be there

I am trying to find entries with specific zip codes in a mailing list (CSV format). I thought this should work but it never finds anything despite my knowing that the sought after zip codes are there.

text = open("during1.txt","r")
a = list(range(93201,93399))
b = list(range(93529,93535))
c = list(range(93601,93899))
d = list(range(95301,95399))
KFCFzip = a+b+c+d
output = open("output.txt","w")

for line in text:
    array= line.strip().split(",")
    print(array[6][0:5])
    if array[6][0:5] in KFCFzip:
        #output.write(array)
        print("yes")
text.close()
output.close()

When I run the code, it finds no matches, but the print statement above the IF statement prints out values that look like they should be matches, and when I go to the Shell and type in something like

93701 in KFCFzip

It gives me back "True:, so it's work to that extent. The file is just text separated by commas, so I can't figure out why it can see them. The data file has live data, so I would have to change it a bit before posting. I was wondering if anyone had any ideas that didn't involve posting the data itself.

Upvotes: 0

Views: 256

Answers (3)

pillmuncher
pillmuncher

Reputation: 10162

You should use the csv module. The way you do it, if one of the fields in your file contains a comma, you're screwed.

Also, you shouldn't hide builtin names like zip. And naming your variable array just seems wrong: firstly, it refers to a list, not an array. They are not the same thing. Secondly, variable names should reflect what they refer to, not just the type of what they refer to.

import csv

KFCFzip = [[93201,93399], [93529,93535], [93601,93899], [95301,95399]]

with open('addresses.csv', 'r') as addressfile:
    for address in csv.reader(addressfile):
        zipcode = int(address[6][0:5])
        for lower, upper in KFCFzip:
            if lower <= zipcode < upper:
                print('yes')
                break
        else:
            print('no')

Upvotes: 2

Elalfer
Elalfer

Reputation: 5358

Because array[6][0:5] is the string. You should convert it to the integer before looking at the KFCFzip list.

for line in text:
    array= line.strip().split(",")
    print(array[6][0:5])
    if int(array[6][0:5]) in KFCFzip:
        print("yes")

Another problem with this solution is the performance. range creates a list of elements so you are going to compare every "suspected" ZIP code with every possible zip code. Time complexity for this algorithm is O(n*m) where n = len(KFCFzip) and m - number of lines in the file. Better way is to create a list of ranges like:

KFCFzip = [[93201,93399], [93529,93535], [93601,93899], [95301,95399]]

for line in text:
    array= line.strip().split(",")
    zip = int(array[6][0:5]))
    print(zip)
    found = False
    for r in KFCFzip:
        if zip >= r[0] and zip < r[1]:
            found = True
            break
    if found:
        print("yes")

in this case you can dramatically increase the performance.

For instance using your data you would have 197+5+297+97 = 596 elements, so for each line you would have to make 596/2 = 298 comparisons in average. But using my algorithms you'll have only 8/2 = 4 comparisons, which ~ 75 times less (read faster).

Upvotes: 7

chmullig
chmullig

Reputation: 13436

It's probably an issue with strings vs ints. Try intifying your array[6][0:5] or stringifying your ranges.

Upvotes: 1

Related Questions