Master Viewer
Master Viewer

Reputation: 47

Any way to get the line of a Match object?

I'm a beginner in programming, using python, and I'm trying to read a text file with multiple lines and apply a regex to check every match, the problem is, I need to know in which line in the file the match starts.

file = open(data)
data = file.read()
file.close()

result = re.finditer(r'--(\d+)\t+.+(?:\n--\1\t+.+)*', data)
for match in result:
    ...

Since I read the whole file at the beginning, I'm using finditer to find all the matches in the content. Is there a way I can tell at what line does the each match start? I can't seem to find any in the documentation.

Upvotes: 1

Views: 346

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121486

Match objects have Match.start() and Match.end() methods that give you the offsets for the start and end of a match into the entire string. Count the number of \n line separators before that point to translate these into a line number.

The following function counts the number of newlines preceding the matched position, adding 1 to number lines starting at 1 rather than at 0:

def line_for_match(m):
    return m.string.count("\n", 0, m.start()) + 1

If your matches can span multiple lines, you may want to use m.end() in there and count the number of newlines matched between start and end by counting \n characters in m[0].

The function makes use of the extra arguments to the str.count() method to limit counting to the part of the input string (referenced via Match.string); they are the start and end positions in the string, respectively.

Demo:

>>> import re
>>> def line_for_match(m):
...     return m.string.count("\n", 0, m.start()) + 1
...
>>> data = "foosball\nbartender\nbazar\n" * 3
>>> pattern = r"(?:foo|bar)"
>>> print(data)
foosball
bartender
bazar
foosball
bartender
bazar
foosball
bartender
bazar

>>> for m in re.finditer(pattern, data):
...     print(f"{line_for_match(m)}: {m[0]}")
...
1: foo
2: bar
4: foo
5: bar
7: foo
8: bar

Upvotes: 5

Related Questions