Reputation: 17
My problem: I have two .txt files, I would like to use one of the files as a guide for the filtering the second file. Appending the similarities to a new .txt file 3.
For example: File 1: A list of names File 2: A list of names and email addressess.
If any name from file 1 is not found in any line(s) in file 2, delete that line and append the matching line to a new .txt file.
Obviously I have googled this question every which way I could word it, and have even found a web application that does exactly this, however it is not capable of handling the size files I need. I have attempted to write a python script for doing this (I am fairly new to programming), from what i have read im sure it would be easier using something like NumPy which I do not know. I just need a nudge in the right direction, this is just slightly outside of my skill set. I am capable of writing a script for web scraping using regex and other basic beginning stuff like that, but this is something I really need to solve quickly and cannot seem to find a solution that truly fits the problem elsewhere. Every other solution to similarly asked questions is referring to a single string, or showing differences not similarities.
This is my attempt, which is obviously incorrect:
file1 = input("Input file 1: ")
file2 = input("Input file 2: ")
with open("file1.txt", r) as f1:
lines1 = f1.read.splitlines()
names = file1.split(";")[0]
emails = file1.split(";")[1]
with open("file2.txt", r) as f2:
lines2 = f2.read.splitlines()
newfile = open("newfile", w)
for names in lines2:
strip(line)
newfile.write(line)
I would really appreciate some advice or a nudge in the correct direction. Thank you !
File sample:
file 1:
[email protected]
[email protected]
[email protected]
[email protected]
File 2:
1.Jack Young;[email protected]
2.George Russel;[email protected]
3.Susan Shields;[email protected]
4.Mary Cartwright;[email protected]
5.Heather Carter;[email protected]
6.Denise Black;[email protected]
7.Tanner Tennebaum;[email protected]
8.John Grable;[email protected]
9.Connor Hawk;[email protected]
So I am looking to parse the first 4 Name;Email lines in file 2 using file 1 as the source of interesting data.
Upvotes: 2
Views: 119
Reputation: 16496
My assumptions:
# first, read in the smaller set of the entries we are interested in finding
with open("file1.txt") as file1:
# strip them of any leading or trailing whitespace (e.g. newlines) and empty lines
needles = [line.strip() for line in file1.readlines()]
needles = set(filter(bool, needles))
# now open the haystack file2 and the output file3
with open("file2.txt") as file2, open("file3.txt", "w") as file3:
# we iterate line by line to not fill up the memory too much
for line in file2:
# strip any whitespace from the line
line = line.strip()
# ...and skip over empty lines
if not line: continue
# we assume each line contains exactly one semicolon that separates name from email
name, email = line.split(";")
# does the email match any of the ones we are looking for?
if email in needles:
# then format the line correctly and write it to the output file
file3.write("{};{}\n".format(name, email))
Contents of the input files:
file1.txt
[email protected]
[email protected]
[email protected]
[email protected]
file2.txt
Jack Young;[email protected]
George Russel;[email protected]
Susan Shields;[email protected]
Mary Cartwright;[email protected]
Heather Carter;[email protected]
Denise Black;[email protected]
Tanner Tennebaum;[email protected]
John Grable;[email protected]
Connor Hawk;[email protected]
contents of file3.txt after running the script:
Jack Young;[email protected]
George Russel;[email protected]
Susan Shields;[email protected]
Mary Cartwright;[email protected]
Upvotes: 2