Reputation: 169
I have a dictionary and a CSV file (which is actually tab delimited):
dict1
:
{1 : ['Charles', 22],
2: ['James', 36],
3: ['John', 18]}
data.csv
:
[ 22 | Charles goes to the cinema | Activity ]
[ 46 | John is a butcher | Profession ]
[ 95 | Charles is a firefighter | Profession ]
[ 67 | James goes to the zoo | Activity ]
I want to take the string (name) in the first item of dict1
's value and search for it in the second column of the csv. If the name appears in the sentence, I want to print the first (and only the first) sentence.
But I am having problem with the searching - how do I access the column/row data while iterating through dict1
? I have tried something like this:
with open('data.csv', 'r', encoding='utf-8') as file:
reader = csv.reader(file, delimiter='\t')
for (id, (name, age)) in dict1.items():
if name in reader.row[1] # reader.row[1] is wrong!!!
print(reader.row[1])
Upvotes: 2
Views: 1419
Reputation: 300
Yes, roganjosh is right. Better way is traverse CSV file and find any key.
requested = {d[0] for d in dict1.values()}
with open('/tmp/f.csv', newline='') as csvfile:
for row in csv.reader(csvfile, delimiter='\t'):
sentence = row[1]
found = {n for n in requested if n in sentence}
for n in found:
print(f'{n}: {sentence}')
requested -= found
if not requested: # optimization, all names used
break
EDIT: answer for question, not for my imagination
EDIT2: after clarification (and some new requirements)... I hope I hit.
Prints sentence only ones per row. It not check if the same sentence is in another row. You can use set()
for keep matched sentences and print them when CVS file has been proceed.
I used regex for match worlds not any sub-string.
import csv
import re
requested = {re.compile(r'\b' + re.escape(d[0]) + r'\b') for d in dict1.values()}
with open('/tmp/f.csv', newline='') as csvfile:
for row in csv.reader(csvfile, delimiter='\t'):
sentence = row[1]
found = {n for n in requested if n.search(sentence)}
if found:
requested -= found
print(sentence)
if not requested:
break
EDIT3: restore hit names (new requirement – like in real dev project :-P)
First, you can match more than one name (see len(found)
).
In last example you can recover name from compiled regex (because before r'\b
' was added before and after name):
found_names = [r.pattern[2:-2] for r in found]
But I don't think it's best way.
Better way is add original name to requested
. I deiced to use set
of tuples
. Operations on sets are very fast.
requested = {(re.compile(r'\b' + re.escape(d[0]) + r'\b'), d[0])
for d in dict1.values()}
with open('/tmp/f.csv', newline='') as csvfile:
for row in csv.reader(csvfile, delimiter='\t'):
sentence = row[1]
found = {(r, n) for r, n in requested if r.search(sentence)}
if found:
found_names = tuple(n for r, n in found)
print(found_names, sentence)
requested -= found
if not requested:
break
Now found names (original d[0]
) are in list found_names
. You can user it as you want. For example change to string (do replace found_name=
and print` lines):
found_names = ', '.join(n for r, n in found)
print(f'{found_names}: {sentence}')
Upvotes: 1