k6ps
k6ps

Reputation: 369

How to match line beginning and end in a multi-line string

I would like to match entire line in a multi-line string (this code is part of unit test that checks the correct output format).

Python 3.5.2 (default, Nov 12 2018, 13:43:14) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.match(r".*score = 0\.59.*", r"score = 0.65\nscore = 0.59\nscore = 1.0", re.MULTILINE)
<_sre.SRE_Match object; span=(0, 39), match='score = 0.65\\nscore = 0.59\\nscore = 1.0'>

This works fine, i can match anything within multiline string. However, i would like to make sure that i match entire line. The documentation sais that the ^ and $ should match the beginning and end of line when re.MULTILINE is used. However, this somehow does not work for me:

>>> re.match(r".*^score = 0\.59$.*", r"score = 0.65\nscore = 0.59\nscore = 1.0", re.MULTILINE)
>>> 

Here are a few more experiments i made:

>>> import os
>>> re.match(r".*^score = 0\.59$.*", "score = 0.65{}score = 0.59{}score = 1.0".format(os.linesep, os.linesep), re.MULTILINE)
>>>
>>> re.match(r".*^score = 0\.65$.*", "score = 0.65{}score = 0.59{}score = 1.0".format(os.linesep, os.linesep), re.MULTILINE)
<_sre.SRE_Match object; span=(0, 12), match='score = 0.65'>
>>> re.match(r".*^score = 0\.65$.*", r"score = 0.65\nscore = 0.59\nscore = 1.0", re.MULTILINE)
>>> 

I guess i'm missing something rather simple, but couldn't figure that out.

Upvotes: 4

Views: 846

Answers (3)

mportes
mportes

Reputation: 1837

The real answer to your question is that you only confused match and search:

>>> import os, re
>>> print(re.match(r".*^score = 0\.59$.*", "score = 0.65\nscore = 0.59\nscore = 1.0", flags=re.MULTILINE))
None
>>> print(re.search(r".*^score = 0\.59$.*", "score = 0.65\nscore = 0.59\nscore = 1.0", flags=re.MULTILINE))
<_sre.SRE_Match object; span=(13, 25), match='score = 0.59'>
>>> 

That's why one of your non-raw examples worked, while the other did not.

Upvotes: 1

Jean-Fran&#231;ois Fabre
Jean-Fran&#231;ois Fabre

Reputation: 140168

problem is that since you're using raw strings for your string, \n is seen as ... well \ then n. Regexes will understand \n in the pattern, but not in the input string.

Also, even if not important there, always use flags= keyword, as some regex functions have an extra count parameter and that can lead to errors.

like this:

re.match(r".*^score = 0\.65$.*", "score = 0.65\nscore = 0.59\nscore = 1.0", flags=re.MULTILINE)
<_sre.SRE_Match object; span=(0, 12), match='score = 0.65'>

and as I noted in comments, .* needs re.DOTALL to match newlines

>>> re.match(r".*^score = \d+\.\d+$.*", "score = 0.65\nscore = 0.59\nscore = 1.0", re.MULTILINE|re.DOTALL)
<_sre.SRE_Match object; span=(0, 37), match='score = 0.65\nscore = 0.59\nscore = 1.0'>

(as noted in Python regex, matching pattern over multiple lines.. why isn't this working? and How do I match any character across multiple lines in a regular expression? of which this could be a duplicate if it wasn't for the raw string bit)

(sorry, my floating point regex is probably a bit weak, you can find better ones around)

Upvotes: 3

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520958

You need to match against a non raw string and use DOTALL mode:

print re.match(r".*^score = 0\.59$.*", "score = 0.65\nscore = 0.59\nscore = 1.0",
    re.MULTILINE|re.DOTALL)

<_sre.SRE_Match object at 0x7fd2426d0648>

Upvotes: 2

Related Questions