Reputation: 533
I'm having trouble finishing off some python code I've been working on and will appreciate any suggestions. I have two files:
file1
>name1
>name3
>name4
file2
>name1 blah blah
aaaaaaaaaaaaaaaaaaaaaaaaa
>name2 blah blah
cccccccaaaaaaaaaaaaaaaaaa
>name3 blah blah
aaaaaattttttttttaaaaaaaaa
>name4 blah blah
aaaaaattttttttttggggggggg
>name5 blah blah
aaaggggcccctttttggggggggg
Each line of file1 contains a string also found in file2. For each line of file1, I would like to find the line it matches in file2, then print that line and the next line. This is my desired final result:
>name1 blah blah
aaaaaaaaaaaaaaaaaaaaaaaaa
>name3 blah blah
aaaaaattttttttttaaaaaaaaa
>name4 blah blah
aaaaaattttttttttggggggggg
I so far have the following code:
nums=set()
with open("file1.txt") as file1:
for line in file1:
nums.add(line.strip())
with open("file2.txt") as file2, open("out.txt", "wt")
as outfile:
for line in file2:
if any(word in line for word in nums):
outfile.write(line)
This code presently contains two issues:
Any substring in file2 that matches a string in file1 is printed to outfile (using the example here, if >name3 is in the set nums, then lines starting with >name3 as well as >name31 and >name367 will be printed)
I haven't figured out how to print both the line that contains the match and the next line (perhaps this can be done with islice?)
Thanks for any advice!
Upvotes: 0
Views: 1770
Reputation: 7812
Any substring in file2 that matches a string in file1 is printed to outfile (using the example here, if >name3 is in the set nums, then lines starting with >name3 as well as >name31 and >name367 will be printed)
This problem can be solved in 2 ways.
Just add space.
If you're sure that after your "keyword" will be space, you can add just add space.
Example:
if any(word + " " in line for word in nums):
Regular expression.
To solve this you can use regular expressions. You should import re
and change:
if any(word in line for word in nums):
To:
if any(re.match(f"^{word}\\b", line) for word in nums):
Explanation: ^
means start of line, \b
is word boundary. Here is the link to website for regex testing.
I haven't figured out how to print both the line that contains the match and the next line (perhaps this can be done with islice?)
You iterate over file using for line in file2:
which read file line by line. If you want to print next line you can use few methods.
Boolean flag.
To implement this you should declare boolean value before loop and set it to False
. Inside loop you should write line to outfile
if this variable is True
and change it back to False
. You should set True
to this variable inside your current condition.
Example:
read_next = False
for line in file2:
if read_next:
outfile.write(line)
read_next = False
if any(re.match(f"^{word}\\b", line) for word in nums):
outfile.write(line)
read_next = True
Change loop from for
to while
.
You can use readline()
method (docs) to iterate over file manually.
Example:
line = file2.readline()
while line:
line = line.strip()
if any(re.match(f"^{word}\\b", line) for word in nums):
outfile.write(line)
line = file2.readline()
if line:
outfile.write(line)
else: # if the end of file reached
outfile.write("\n") # delete it in case if you don't need this
break
line = f.readline()
Upvotes: 1
Reputation: 11238
l=[]
# getting all the data from file and dividing them in two part and appending
#them in a list
with open(r'C:\Users\user\RegForm.txt','r') as file:
count =0
tmp=file.read().split('\n')
for line in range(1,len(tmp),2):
l.append([tmp[line-1],tmp[line]])
# getting all the value to search from file in a list
to_find=[]
with open(r'C:\Users\user\untitled0.txt','r') as file:
for line in file:
to_find.append(line.strip('\n'))
res =[]
# searching for file if they exist or not
for i in to_find:
for j in l:
if i in j[0]:
print(j[0],j[1],sep='\n')
break
"""
output
>name1 blah blah
aaaaaaaaaaaaaaaaaaaaaaaaa
>name3 blah blah
aaaaaattttttttttaaaaaaaaa
>name4 blah blah
aaaaaattttttttttggggggggg
"""
Upvotes: 0