Reputation: 186
I am trying to search through a text file for and match part (or all) of the text on two separate lines. I need to return the line number (within the text file) of the matching string (the first line).
An example text file could be:
This is some text on the first line
Here is some more or the second line
This third line has more text.
If I tried to find the following string "second line This third line" it would return a line number of 2 (or really 1 if 0 is the first line).
I have looked at many similar examples and it seems that I should use the re package, however I cannot workout how to return the line number (either Python - Find line number from text file, Python regex: Search across multilines, re.search Multiple lines Python
This code finds the string across multiple lines
import re
a = open('example.txt','r').read()
if re.findall('second line\nThis third line', a, re.MULTILINE):
print('found!')
The code below reads the text file in loop line by line. I realise it will not find a match for the multiline string because it is reading one line at a time.
with open('example.txt') as f:
for line_no, line in enumerate(f):
if line == 'second line\nThis third line':
print ('String found on line: ' + str(line_no))
break
else: # for loop ended => line not found
line_no = -1
print ('\nString Not found')
Question: How do i get the code in my first example to return the line number of the text file or place this code is some sort of loop that counts the lines?
Upvotes: 1
Views: 5541
Reputation: 461
Use .count()
and the match
object to count the number of newlines before the match:
import re
with open('example.txt', 'r') as file:
content = file.read()
match = re.search('second line\nThis third line', content)
if match:
print('Found a match starting on line', content.count('\n', 0, match.start()))
match.start()
is the position of the start of the match in content
.
content.count('\n', 0, match.start())
counts the number of newlines in content
between character position 0
and the start of the match.
Use 1 + content.count('\n', 0, match.start())
if you prefer line numbers to start at 1 instead of 0.
Upvotes: 3
Reputation: 1016
This maybe work for you :
import re
a = open('example.txt','r').read()
if re.findall('second line\nThis third line', a, re.MULTILINE):
print('found!')
with open('example.txt') as f:
count = 0
line1 = 'second line\nThis third line'
line1 = line1.split('\n')
found = 0
not_found = 0
for line_no, line in enumerate(f):
if line1[count] in line :
count += 1
if count == 1 :
found = line_no
if count == len(line1):
not_found = 1
print ('String found on line: ' + str(found))
elif count > 0 :
count = 0
if line1[count] in line :
count += 1
if count == 1 :
found = line_no
if count == len(line1):
not_found = 1
print ('String found on line: ' + str(found))
if not_found == 0 : # for loop ended => line not found
line_no = -1
print ('\nString Not found')
Upvotes: 0
Reputation: 43169
You would either need the whole content as string (file.read()
) or could try:
found = None
for idx, line in enumerate(your_file_pointer_here):
if "second line" in line:
# or line.endswith()
found = idx
elif "This third line" in line:
# or line.startswith()
if found and (idx - 1) == found:
print("Found the overall needle at {}".format(found))
Upvotes: 0