licm
licm

Reputation: 99

Search for content segments of one .txt file in another .txt file digit by digit and print matching lines

I have two txt files: file1.txt and file2.txt.

File1.txt contains the following:

12345678

File2.txt contains this:

34567999
23499899
13571234

I now want to look at the first 3 digits of line 1 of file1.txt (which are "123"). I now want to go to file2.txt and search for these three digits ("123"). When I find these digits in that order in a line, (i.e.: this would be the case in line 3: 13571234), I want to write this line to a new file: file_new.txt.

Then, if all lines in file2.txt have been searched for this sequence from file1.txt ("123"), I want to move one digit further in file1.txt, so that the new search query is "234". Now, I want to go to file2.txt again to search for all sequences with "234" in the, (i.e.: line 2 (23499899) and line 3 (13571234)). As line 3 is already contained in file_new.txt, I only want to write line 2 to file_new.txt.

I want to continue this process, searching for the next three digits until the whole line in file1.txt has been search for in file2.txt.

Could someone please help me tackle this problem?

Upvotes: 1

Views: 125

Answers (2)

user215865
user215865

Reputation: 512

You can use readlines to read text file into list and then generate a new list L using a while loop as below. You can then write this list L to a text file.

with open(file1_path) as file1:
    search_string = file1.readlines()[0]

with open(file2_path) as file2:
    strings_to_search = file2.readlines()

L= []
n=0 
while n < len(search_string):
    for i in strings_to_search:
        if search_string[n:n+3] in i and i not in L:
            L.append(i)
        n +=1

Upvotes: 2

Sozy
Sozy

Reputation: 187

I got a little solution here :

f1 = open('file1.txt', 'r') # open in read mode

for digit in range(len(f1.readlines()[0])-2):
    threedigits = f1.readlines()[0][digit:digit+3] # This is the first three digits

    f2 = open('file2.txt', 'r') # open in read mode
    lines = f2.readlines() # we read all lines
    f2.close()
    file_new = []
    for i in lines:
        if firstthreedigits in i:
            file_new.append(i) # we add each lines containing the first three digits

    f3 = open('file_new.txt', 'w') # open in write mode
    for i in range(len(file_new)):
        f3.write(file_new[i]) # we write all lines with first three digits
    f3.close()

f1.close()

This should to it

Upvotes: 1

Related Questions