Lone Wallaby
Lone Wallaby

Reputation: 169

How to check if an item in a dictionary exists in CSV file?

I have a dictionary and a CSV file (which is actually tab delimited):

dict1:

{1 : ['Charles', 22],
2: ['James', 36],
3: ['John', 18]}

data.csv:


[ 22 | Charles goes to the cinema | Activity    ]
[ 46 | John is a butcher          | Profession  ]
[ 95 | Charles is a firefighter   | Profession  ]
[ 67 | James goes to the zoo      | Activity    ]

I want to take the string (name) in the first item of dict1's value and search for it in the second column of the csv. If the name appears in the sentence, I want to print the first (and only the first) sentence.

But I am having problem with the searching - how do I access the column/row data while iterating through dict1? I have tried something like this:

with open('data.csv', 'r', encoding='utf-8') as file:
    reader = csv.reader(file, delimiter='\t')
    for (id, (name, age)) in dict1.items():
        if name in reader.row[1] # reader.row[1] is wrong!!!
        print(reader.row[1])

Upvotes: 2

Views: 1419

Answers (1)

rysson
rysson

Reputation: 300

Yes, roganjosh is right. Better way is traverse CSV file and find any key.

requested = {d[0] for d in dict1.values()}
with open('/tmp/f.csv', newline='') as csvfile:
    for row in csv.reader(csvfile, delimiter='\t'):
        sentence = row[1]
        found = {n for n in requested if n in sentence}
        for n in found:
            print(f'{n}: {sentence}')
        requested -= found
        if not requested:  # optimization, all names used
            break

EDIT: answer for question, not for my imagination


EDIT2: after clarification (and some new requirements)... I hope I hit.

Prints sentence only ones per row. It not check if the same sentence is in another row. You can use set() for keep matched sentences and print them when CVS file has been proceed.

I used regex for match worlds not any sub-string.

import csv
import re

requested = {re.compile(r'\b' + re.escape(d[0]) + r'\b') for d in dict1.values()}
with open('/tmp/f.csv', newline='') as csvfile:
    for row in csv.reader(csvfile, delimiter='\t'):
        sentence = row[1]
        found = {n for n in requested if n.search(sentence)}
        if found:
            requested -= found
            print(sentence)
        if not requested:
            break

EDIT3: restore hit names (new requirement – like in real dev project :-P)

First, you can match more than one name (see len(found)).

In last example you can recover name from compiled regex (because before r'\b' was added before and after name):

found_names = [r.pattern[2:-2] for r in found]

But I don't think it's best way.

Better way is add original name to requested. I deiced to use set of tuples. Operations on sets are very fast.

requested = {(re.compile(r'\b' + re.escape(d[0]) + r'\b'), d[0])
             for d in dict1.values()}
with open('/tmp/f.csv', newline='') as csvfile:
    for row in csv.reader(csvfile, delimiter='\t'):
        sentence = row[1]
        found = {(r, n) for r, n in requested if r.search(sentence)}
        if found:
            found_names = tuple(n for r, n in found)
            print(found_names, sentence)
            requested -= found
        if not requested:
            break

Now found names (original d[0]) are in list found_names. You can user it as you want. For example change to string (do replace found_name= and print` lines):

found_names = ', '.join(n for r, n in found)
print(f'{found_names}: {sentence}')

Upvotes: 1

Related Questions