Reputation: 57
I am trying to create a loop that searches through a csv file for rows with a common 3rd and 4th column and performs an operation on them.
The file I have looks like this:
name1,x,y,z,notes
name2,a,b,c,notes
name3,a,y,z,notes
I am using a code that reads the first line and identifies row[2] and row[3] and performs searches all rows in the file for that combination of columns. Unfortunately, I can't seem to figure out how to actually search them.
for row in csvfile:
row_identify = row[2:3]
for row in csvfile:
if row_identify in row:
print row
else:
print "not here"
I want it to print the first and third row (since y and z would be row_identify). I assumed I could just explicitly state that I wanted to search for those rows, but that doesn't seem to work. I also tried using
row_identify = str(row[2]),str(row[3])
but that doesn't seem to work either.
Upvotes: 5
Views: 617
Reputation: 114035
If you are looking to identify rows with the same 3rd and 4th columns as the first row:
import csv
import operator
key = operator.itemgetter(2,3)
with open('path/to/input') as infile:
rows = csv.reader(infile)
holyGrail = key(next(rows))
for row in rows:
if key(row) != holyGrail:
continue
do_stuff(row)
If you'd like a more generalized version, clustering all rows that share a similar 3rd and 4th column, then:
import csv
import operator
from collections import defaultdict as dd
key = operator.itemgetter(2,3)
info = operator.itemgetter(0,1)
similarities = dd(list)
with open('path/to/input') as infile:
for i,row in enumerate(csv.reader(infile)):
similarities[key(row)].append((i,info(row)))
for k, rows in similarities.items():
print("These following rows all have the id <{}> (the data follows):".format(k), ', '.join([str(i) for i,_ in rows]))
print('\n'.join(['\t' + '\t'.join([row]) for _,row in rows])
Upvotes: 0
Reputation: 52223
You can create a dictionary of pairs where keys are tuples containing identifying columns and values are the list of similar rows:
>>> import collections
>>> similarities = collections.defaultdict(list)
>>> for row in csvfile:
... similarities[(row[2], row[3])].append(row)
>>> print similarities
{('y', 'z'): [['name1', 'x', 'y', 'z', 'notes'],
['name3', 'a', 'y', 'z', 'notes']],
('b', 'c'): [['name2', 'a', 'b', 'c', 'notes']]
}
Upvotes: 4