Athi
Athi

Reputation: 391

Get rows from CSV by matching header to multiple dictionary key-values

I have a CSV file with header and I want to retrieve all the rows from CSV that matches a dictionary key-value. Note that dictionary can contain any number of orbitary key and value to match with.

Here is the code I have written to solve this, is there any other better way to approach this (other than pandas dataframe)?

Better way mean - removal of unnecessary variable if any? better data structure, better library, reducing space/time complexity than below solution

options = {'h1': 'v1', 'h2': 'v2'}
output = []
with open("data.csv", "rt") as csvfile:
    data = csv.reader(csvfile, delimiter=',', quotechar='"')
    header = next(data)
    for row in data:
        match = 0
        for k, v in options.items():
            match += 1 if row[header.index(k)] == v else 0
        if len(options.keys()) == match:
            output.append(dict(zip(header, row)))
return output

Upvotes: 0

Views: 1081

Answers (2)

tdelaney
tdelaney

Reputation: 77347

You can use a list comprehension to read and filter the rows of a DictReader. Make the wanted options a set and then its an easy test for intersection.

import csv
  
def test():
    options = {'h1': 'v1', 'h2': 'v2'}
    wanted = set(options.items())
    with open("data.csv", "rt", newline="") as csvfile:
        return [row for row in csv.DictReader(csvfile) if set(row.items()) & wanted]

print(test())
print(len(test()))

Upvotes: 0

martineau
martineau

Reputation: 123473

You don't say what you would consider a "better" approach to be. That said, it would take fewer lines of code if you used a csv.DictReader to process the input file as illustrated.

import csv


def find_matching_rows(filename, criteria, delimiter=',', quotechar='"'):
    criteria_values = tuple(criteria.values())
    matches = []
    with open(filename, 'r', newline='') as csvfile:
        for row in csv.DictReader(csvfile, delimiter=delimiter, quotechar=quotechar):
            if tuple(row[key] for key in criteria) == criteria_values:
                matches.append(row)
    return matches


results = find_matching_rows('matchtest.csv', {'h1': 'v1', 'h2': 'v2'})
for row in results:
    print(row)

Upvotes: 1

Related Questions