Comparing two files for similarities, not the commonly asked question

Question

My problem: I have two .txt files, I would like to use one of the files as a guide for the filtering the second file. Appending the similarities to a new .txt file 3.

For example: File 1: A list of names File 2: A list of names and email addressess.

If any name from file 1 is not found in any line(s) in file 2, delete that line and append the matching line to a new .txt file.

Obviously I have googled this question every which way I could word it, and have even found a web application that does exactly this, however it is not capable of handling the size files I need. I have attempted to write a python script for doing this (I am fairly new to programming), from what i have read im sure it would be easier using something like NumPy which I do not know. I just need a nudge in the right direction, this is just slightly outside of my skill set. I am capable of writing a script for web scraping using regex and other basic beginning stuff like that, but this is something I really need to solve quickly and cannot seem to find a solution that truly fits the problem elsewhere. Every other solution to similarly asked questions is referring to a single string, or showing differences not similarities.

This is my attempt, which is obviously incorrect:

    file1 = input("Input file 1: ")
    file2 = input("Input file 2: ")
            
    with open("file1.txt", r) as f1:
        lines1 = f1.read.splitlines()
        names = file1.split(";")[0]
        emails = file1.split(";")[1]
    with open("file2.txt", r) as f2:
        lines2 = f2.read.splitlines()
            
        newfile = open("newfile", w)
            
    for names in lines2:
        strip(line)
        newfile.write(line)

I would really appreciate some advice or a nudge in the correct direction. Thank you !

File sample:

file 1:
1.ustrading@uste-miami.com  
2.georgeanddonna@reagan.com  
3.sbright@carltonrochell.com  
4.mary@roadrunnerss.com  

File 2:  
1.Jack Young;ustrading@uste-miami.com  
2.George Russel;georgeanddonna@reagan.com  
3.Susan Shields;sbright@carltonrochell.com  
4.Mary Cartwright;mary@roadrunnerss.com  
5.Heather Carter;heatherc@bridgerkitchens.com  
6.Denise Black;dd@genereux.us  
7.Tanner Tennebaum;ctannenbaum@chefswithaltitude.com  
8.John Grable;jgrable@johngrable.com  
9.Connor Hawk;cmhworld@rof.net

So I am looking to parse the first 4 Name;Email lines in file 2 using file 1 as the source of interesting data.

Comparing two files for similarities, not the commonly asked question

Answers (1)

Related Questions