Reputation: 47
I'm a beginner in programming, using python, and I'm trying to read a text file with multiple lines and apply a regex to check every match, the problem is, I need to know in which line in the file the match starts.
file = open(data)
data = file.read()
file.close()
result = re.finditer(r'--(\d+)\t+.+(?:\n--\1\t+.+)*', data)
for match in result:
...
Since I read the whole file at the beginning, I'm using finditer to find all the matches in the content. Is there a way I can tell at what line does the each match start? I can't seem to find any in the documentation.
Upvotes: 1
Views: 346
Reputation: 1121486
Match objects have Match.start()
and Match.end()
methods that give you the offsets for the start and end of a match into the entire string. Count the number of \n
line separators before that point to translate these into a line number.
The following function counts the number of newlines preceding the matched position, adding 1 to number lines starting at 1 rather than at 0:
def line_for_match(m):
return m.string.count("\n", 0, m.start()) + 1
If your matches can span multiple lines, you may want to use m.end()
in there and count the number of newlines matched between start and end by counting \n
characters in m[0]
.
The function makes use of the extra arguments to the str.count()
method to limit counting to the part of the input string (referenced via Match.string
); they are the start and end positions in the string, respectively.
Demo:
>>> import re
>>> def line_for_match(m):
... return m.string.count("\n", 0, m.start()) + 1
...
>>> data = "foosball\nbartender\nbazar\n" * 3
>>> pattern = r"(?:foo|bar)"
>>> print(data)
foosball
bartender
bazar
foosball
bartender
bazar
foosball
bartender
bazar
>>> for m in re.finditer(pattern, data):
... print(f"{line_for_match(m)}: {m[0]}")
...
1: foo
2: bar
4: foo
5: bar
7: foo
8: bar
Upvotes: 5