Nic
Nic

Reputation: 186

Search a text file for a multi line string and return line number in Python

I am trying to search through a text file for and match part (or all) of the text on two separate lines. I need to return the line number (within the text file) of the matching string (the first line).

An example text file could be:

This is some text on the first line
Here is some more or the second line
This third line has more text.

If I tried to find the following string "second line This third line" it would return a line number of 2 (or really 1 if 0 is the first line).

I have looked at many similar examples and it seems that I should use the re package, however I cannot workout how to return the line number (either Python - Find line number from text file, Python regex: Search across multilines, re.search Multiple lines Python

This code finds the string across multiple lines

import re

a = open('example.txt','r').read()
if re.findall('second line\nThis third line', a, re.MULTILINE):
    print('found!')

The code below reads the text file in loop line by line. I realise it will not find a match for the multiline string because it is reading one line at a time.

with open('example.txt') as f:
    for line_no, line in enumerate(f):
        if line == 'second line\nThis third line':
            print ('String found on line: ' + str(line_no))
            break
    else: # for loop ended => line not found
        line_no = -1
        print ('\nString Not found')

Question: How do i get the code in my first example to return the line number of the text file or place this code is some sort of loop that counts the lines?

Upvotes: 1

Views: 5541

Answers (3)

Kim
Kim

Reputation: 461

Use .count() and the match object to count the number of newlines before the match:

import re

with open('example.txt', 'r') as file:
    content = file.read()
match = re.search('second line\nThis third line', content)
if match:
    print('Found a match starting on line', content.count('\n', 0, match.start()))

match.start() is the position of the start of the match in content.

content.count('\n', 0, match.start()) counts the number of newlines in content between character position 0 and the start of the match.

Use 1 + content.count('\n', 0, match.start()) if you prefer line numbers to start at 1 instead of 0.

Upvotes: 3

Bhargav Desai
Bhargav Desai

Reputation: 1016

This maybe work for you :

import re

a = open('example.txt','r').read()
if re.findall('second line\nThis third line', a, re.MULTILINE):
    print('found!')

with open('example.txt') as f:
    count = 0
    line1 = 'second line\nThis third line'
    line1 = line1.split('\n')
    found = 0
    not_found = 0
    for line_no, line in enumerate(f):
        if line1[count] in line :
            count += 1
            if count == 1 :
                found = line_no
            if count == len(line1):
                not_found = 1
                print ('String found on line: ' + str(found))
        elif count > 0 :
            count = 0
            if line1[count] in line :
                count += 1
                if count == 1 :
                    found = line_no
                if count == len(line1):
                    not_found = 1
                    print ('String found on line: ' + str(found))
    if not_found == 0 : # for loop ended => line not found
        line_no = -1
        print ('\nString Not found')

Upvotes: 0

Jan
Jan

Reputation: 43169

You would either need the whole content as string (file.read()) or could try:

found = None
for idx, line in enumerate(your_file_pointer_here):
    if "second line" in line:
    # or line.endswith()
        found = idx
    elif "This third line" in line:
    # or line.startswith()
        if found and (idx - 1) == found:
            print("Found the overall needle at {}".format(found))

Upvotes: 0

Related Questions