HarryD
HarryD

Reputation: 63

Python search csv file from input text file

I'm new to python and I struggling with this code. Have 2 file, 1st file is text file containing email addresses (one each line), 2nd file is csv file with 5-6 columns. Script should take search input from file1 and search in file 2, the output should be stored in another csv file (only first 3 columns) see example below. Also I have copied a script that I was working on. If there is a better/efficient script then please let me know. Thank you, appreciate your help.

File1 (output.txt)
[email protected]
[email protected]
[email protected]

File2 (final.csv)
Sam,Smith,[email protected],admin
Eric,Smith,[email protected],finance
Joe,Doe,[email protected],telcom
Chase,Li,[email protected],IT

output (out_name_email.csv)
Eric,Smith,[email protected]
Chase,Li,[email protected]

Here is the script

import csv
outputfile = 'C:\\Python27\\scripts\\out_name_email.csv'
inputfile = 'C:\\Python27\\scripts\\output.txt'
datafile = 'C:\\Python27\\scripts\\final.csv'

names=[]

with open(inputfile) as f:
    for line in f:
        names.append(line)

with open(datafile, 'rb') as fd, open(outputfile, 'wb') as fp_out1:
    writer = csv.writer(fp_out1, delimiter=",")
    reader = csv.reader(fd, delimiter=",")
    headers = next(reader)
    for row in fd:
        for name in names:
            if name in line:
                writer.writerow(row)

Upvotes: 2

Views: 510

Answers (1)

Jon Clements
Jon Clements

Reputation: 142256

Load the emails into a set for O(1) lookup:

with open(inputfile) as fin:
    emails = set(line.strip() for line in fin)

Then loop over the rows once, and check it exists in emails - no need to loop over each possible match for each row:

# ...
for row in reader:
    if row[1] in emails:
        writer.writerow(row)

If you're not doing anything else, then you can make it:

writer.writerows(row for row in reader if row[1] in emails)

A couple of notes, in your original code you're not using the csv.reader object reader - you're looping over fd and you appear to have some naming issues with names and line and row...

Upvotes: 3

Related Questions