Reputation: 391
I have a CSV file with header and I want to retrieve all the rows from CSV that matches a dictionary key-value. Note that dictionary can contain any number of orbitary key and value to match with.
Here is the code I have written to solve this, is there any other better way to approach this (other than pandas dataframe)?
Better way mean - removal of unnecessary variable if any? better data structure, better library, reducing space/time complexity than below solution
options = {'h1': 'v1', 'h2': 'v2'}
output = []
with open("data.csv", "rt") as csvfile:
data = csv.reader(csvfile, delimiter=',', quotechar='"')
header = next(data)
for row in data:
match = 0
for k, v in options.items():
match += 1 if row[header.index(k)] == v else 0
if len(options.keys()) == match:
output.append(dict(zip(header, row)))
return output
Upvotes: 0
Views: 1081
Reputation: 77347
You can use a list comprehension to read and filter the rows of a DictReader. Make the wanted options a set and then its an easy test for intersection.
import csv
def test():
options = {'h1': 'v1', 'h2': 'v2'}
wanted = set(options.items())
with open("data.csv", "rt", newline="") as csvfile:
return [row for row in csv.DictReader(csvfile) if set(row.items()) & wanted]
print(test())
print(len(test()))
Upvotes: 0
Reputation: 123473
You don't say what you would consider a "better" approach to be. That said, it would take fewer lines of code if you used a csv.DictReader
to process the input file as illustrated.
import csv
def find_matching_rows(filename, criteria, delimiter=',', quotechar='"'):
criteria_values = tuple(criteria.values())
matches = []
with open(filename, 'r', newline='') as csvfile:
for row in csv.DictReader(csvfile, delimiter=delimiter, quotechar=quotechar):
if tuple(row[key] for key in criteria) == criteria_values:
matches.append(row)
return matches
results = find_matching_rows('matchtest.csv', {'h1': 'v1', 'h2': 'v2'})
for row in results:
print(row)
Upvotes: 1