Reputation: 73
I have a csv file that has 2 columns. I am simply trying to figure if each row[0]
value is in some row[1]
and if so, to print row
.
Items in csv file:
COL1, COL2
1-A, 1-A
1-B, 2-A
2-A, 1-B
2565, 2565
51Bc, 51Bc
5161, 56
811, 65
681, 11
55, 3
3, 55
Code:
import csv
doc= csv.reader(open('file.csv','rb'))
for row in doc:
if row[0] in row[1]:
print row[0]
The end result should be:
1-A
1-B
2-A
2565
51Bc
55
3
Instead, it is giving me:
1-A
2565
51Bc
It prints those numbers because they are right next to each other side by side but what I need it to do is get the first item in COL1 and see if it finds it in the entire COL2 list and print if it does. Not see if its beside each other and print it.
Upvotes: 2
Views: 33827
Reputation: 6945
When you say for row in doc
, it's only getting one pair of elements and putting them in row
. So there's no possible way row[1]
can hold that entire column, at any point in time. You need to do an initial loop to get that column as a list, then loop through the csv
file again to do the comparison. Actually, you could store both columns in separate lists, and only have to open the file once.
import csv
doc= csv.reader(open('file.csv','rb'))
# Build the lists.
first_col = []
second_col = set()
for row in doc:
first_col.append(row[0])
second_col.add(row[1])
# Now actually do the comparison.
for item in first_col:
if item in second_col:
print item
As per abarnert's suggestion, we're using a set()
for the second column. set
s are optimized for looking up values inside it, which is all we're doing with it. A list
is optimized for looping through every element, which is what we do with first_col
, so that makes more sense there.
Upvotes: 3