Reputation: 6333
I am trying to find entries with specific zip codes in a mailing list (CSV format). I thought this should work but it never finds anything despite my knowing that the sought after zip codes are there.
text = open("during1.txt","r")
a = list(range(93201,93399))
b = list(range(93529,93535))
c = list(range(93601,93899))
d = list(range(95301,95399))
KFCFzip = a+b+c+d
output = open("output.txt","w")
for line in text:
array= line.strip().split(",")
print(array[6][0:5])
if array[6][0:5] in KFCFzip:
#output.write(array)
print("yes")
text.close()
output.close()
When I run the code, it finds no matches, but the print statement above the IF statement prints out values that look like they should be matches, and when I go to the Shell and type in something like
93701 in KFCFzip
It gives me back "True:, so it's work to that extent. The file is just text separated by commas, so I can't figure out why it can see them. The data file has live data, so I would have to change it a bit before posting. I was wondering if anyone had any ideas that didn't involve posting the data itself.
Upvotes: 0
Views: 256
Reputation: 10162
You should use the csv
module. The way you do it, if one of the fields in your file contains a comma, you're screwed.
Also, you shouldn't hide builtin names like zip
. And naming your variable array
just seems wrong: firstly, it refers to a list
, not an array
. They are not the same thing. Secondly, variable names should reflect what they refer to, not just the type of what they refer to.
import csv
KFCFzip = [[93201,93399], [93529,93535], [93601,93899], [95301,95399]]
with open('addresses.csv', 'r') as addressfile:
for address in csv.reader(addressfile):
zipcode = int(address[6][0:5])
for lower, upper in KFCFzip:
if lower <= zipcode < upper:
print('yes')
break
else:
print('no')
Upvotes: 2
Reputation: 5358
Because array[6][0:5]
is the string. You should convert it to the integer before looking at the KFCFzip
list.
for line in text:
array= line.strip().split(",")
print(array[6][0:5])
if int(array[6][0:5]) in KFCFzip:
print("yes")
Another problem with this solution is the performance. range
creates a list of elements so you are going to compare every "suspected" ZIP code with every possible zip code. Time complexity for this algorithm is O(n*m)
where n = len(KFCFzip)
and m - number of lines in the file. Better way is to create a list of ranges like:
KFCFzip = [[93201,93399], [93529,93535], [93601,93899], [95301,95399]]
for line in text:
array= line.strip().split(",")
zip = int(array[6][0:5]))
print(zip)
found = False
for r in KFCFzip:
if zip >= r[0] and zip < r[1]:
found = True
break
if found:
print("yes")
in this case you can dramatically increase the performance.
For instance using your data you would have 197+5+297+97 = 596
elements, so for each line you would have to make 596/2 = 298
comparisons in average. But using my algorithms you'll have only 8/2 = 4
comparisons, which ~ 75 times less (read faster).
Upvotes: 7
Reputation: 13436
It's probably an issue with strings vs ints. Try intifying your array[6][0:5]
or stringifying your ranges.
Upvotes: 1