if row[0] in row[1] print row

Question

I have a csv file that has 2 columns. I am simply trying to figure if each row[0] value is in some row[1] and if so, to print row.

Items in csv file:

COL1,   COL2
1-A,    1-A
1-B,    2-A
2-A,    1-B
2565,   2565
51Bc,   51Bc
5161,   56
811,    65
681,    11
55,     3
3,      55

Code:

import csv
doc= csv.reader(open('file.csv','rb'))

for row in doc:
    if row[0] in row[1]:
        print row[0]

The end result should be:

1-A
1-B
2-A
2565
51Bc
55
3

Instead, it is giving me:

1-A
2565
51Bc

It prints those numbers because they are right next to each other side by side but what I need it to do is get the first item in COL1 and see if it finds it in the entire COL2 list and print if it does. Not see if its beside each other and print it.

TheSoundDefense · Accepted Answer

When you say for row in doc, it's only getting one pair of elements and putting them in row. So there's no possible way row[1] can hold that entire column, at any point in time. You need to do an initial loop to get that column as a list, then loop through the csv file again to do the comparison. Actually, you could store both columns in separate lists, and only have to open the file once.

import csv
doc= csv.reader(open('file.csv','rb'))

# Build the lists.
first_col = []
second_col = set()
for row in doc:
    first_col.append(row[0])
    second_col.add(row[1])

# Now actually do the comparison.
for item in first_col:
    if item in second_col:
        print item

As per abarnert's suggestion, we're using a set() for the second column. sets are optimized for looking up values inside it, which is all we're doing with it. A list is optimized for looping through every element, which is what we do with first_col, so that makes more sense there.

if row[0] in row[1] print row

Answers (1)

Related Questions