TheKassaye
TheKassaye

Reputation: 41

How to search CSV file with multiple search criteria and print row?

I have a .csv file with about 1000 rows which looks like:

id,first_name,last_name,email,gender,ip_address,birthday
1,Ced,Begwell,[email protected],Male,134.107.135.233,17/10/1978
2,Nataline,Cheatle,[email protected],Female,189.106.181.194,26/06/1989
3,Laverna,Hamlen,[email protected],Female,52.165.62.174,24/04/1990
4,Gawen,Gillfillan,[email protected],Male,83.249.190.232,31/10/1984
5,Syd,Gilfether,[email protected],Male,180.153.199.106,11/07/1995

What I have for code so far will ask for input, then go over each row and print the row if it contains the input. Looks like so:

import csv

# Asks for search criteria from user

search = input("Enter search criteria:\n")

# Opens csv data file

file = csv.reader(open("MOCK_DATA.csv"))

# Go over each row and print it if it contains user input.

for row in file:
    if search in row:
        print(row)

What I want for end result, and what I'm stuck on, is to be able to enter more that one search criteria seperated by a "," and it will search and print those rows. Kind of like a way to filter the list.

for expample if there was multiple "David" that are "Male" in the file. I could enter : David, Male

It would then print all the rows that match but ignore those with a "David" thats is "Female".

Upvotes: 4

Views: 3161

Answers (3)

drootang
drootang

Reputation: 2503

You can split the input on the comma then check to make sure each field from the input is present on a given line using all() and list comprehensions.

This example uses a simplistic splitting of the input, and doesn't care which field each input matches. If you want to only match to specific columns, look into using csv.DictReader instead of csv.reader.

import csv
# Asks for search criteria from user
search_parts = input("Enter search criteria:\n").split(",")
# Opens csv data file
file = csv.reader(open("MOCK_DATA.csv"))
# Go over each row and print it if it contains user input.
for row in file:
    if all([x in row for x in search_parts]):
        print(row)

Upvotes: 1

Ajax1234
Ajax1234

Reputation: 71451

While you could just check if the strings "David" and "Male" exist in a row, it would not be very precise should you need to check column values. Instead, read in the data via csv and create a list of namedtuple objects that store the search value and header name:

from collections import namedtuple
import csv
data = list(csv.reader(open('filename.csv')))
search = namedtuple('search', 'value,header')
searches = [search(i, data[0].index(b)) for i, b in zip(input().split(', '), ['first_name', 'gender'])]
final_results = [i for i in data if all(c.value == i[c.header] for c in searches)]

Upvotes: 0

jpp
jpp

Reputation: 164693

If you are happy to use a 3rd party library, this is possible with pandas.

I have modified your data slightly to demonstrate a simple query.

import pandas as pd
from io import StringIO

mystr = StringIO("""id,first_name,last_name,email,gender,ip_address,birthday
1,Ced,Begwell,[email protected],Male,134.107.135.233,17/10/1978
2,Nataline,Cheatle,[email protected],Female,189.106.181.194,26/06/1989
3,Laverna,Hamlen,[email protected],Female,52.165.62.174,24/04/1990
4,David,Gillfillan,[email protected],Male,83.249.190.232,31/10/1984
5,David,Gilfether,[email protected],Male,180.153.199.106,11/07/1995""")

# replace mystr with 'file.csv'
df = pd.read_csv(mystr)

# retrieve user inputs
first_name = input('Input a first name\n:')
gender = input('Input a gender, Male or Female\n:')

# calculate Boolean mask
mask = (df['first_name'] == first_name) & (df['gender'] == gender)

# apply mask to result
res = df[mask]

print(res)

#    id first_name   last_name                     email gender  \
# 3   4      David  Gillfillan       [email protected]   Male   
# 4   5      David   Gilfether  [email protected]   Male   

#         ip_address    birthday  
# 3   83.249.190.232  31/10/1984  
# 4  180.153.199.106  11/07/1995  

Upvotes: 0

Related Questions