Reputation: 369
I would like to match entire line in a multi-line string (this code is part of unit test that checks the correct output format).
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.match(r".*score = 0\.59.*", r"score = 0.65\nscore = 0.59\nscore = 1.0", re.MULTILINE)
<_sre.SRE_Match object; span=(0, 39), match='score = 0.65\\nscore = 0.59\\nscore = 1.0'>
This works fine, i can match anything within multiline string. However, i would like to make sure that i match entire line. The documentation sais that the ^
and $
should match the beginning and end of line when re.MULTILINE
is used. However, this somehow does not work for me:
>>> re.match(r".*^score = 0\.59$.*", r"score = 0.65\nscore = 0.59\nscore = 1.0", re.MULTILINE)
>>>
Here are a few more experiments i made:
>>> import os
>>> re.match(r".*^score = 0\.59$.*", "score = 0.65{}score = 0.59{}score = 1.0".format(os.linesep, os.linesep), re.MULTILINE)
>>>
>>> re.match(r".*^score = 0\.65$.*", "score = 0.65{}score = 0.59{}score = 1.0".format(os.linesep, os.linesep), re.MULTILINE)
<_sre.SRE_Match object; span=(0, 12), match='score = 0.65'>
>>> re.match(r".*^score = 0\.65$.*", r"score = 0.65\nscore = 0.59\nscore = 1.0", re.MULTILINE)
>>>
I guess i'm missing something rather simple, but couldn't figure that out.
Upvotes: 4
Views: 846
Reputation: 1837
The real answer to your question is that you only confused match
and search
:
>>> import os, re
>>> print(re.match(r".*^score = 0\.59$.*", "score = 0.65\nscore = 0.59\nscore = 1.0", flags=re.MULTILINE))
None
>>> print(re.search(r".*^score = 0\.59$.*", "score = 0.65\nscore = 0.59\nscore = 1.0", flags=re.MULTILINE))
<_sre.SRE_Match object; span=(13, 25), match='score = 0.59'>
>>>
That's why one of your non-raw examples worked, while the other did not.
Upvotes: 1
Reputation: 140168
problem is that since you're using raw strings for your string, \n
is seen as ... well \
then n
. Regexes will understand \n
in the pattern, but not in the input string.
Also, even if not important there, always use flags=
keyword, as some regex functions have an extra count
parameter and that can lead to errors.
like this:
re.match(r".*^score = 0\.65$.*", "score = 0.65\nscore = 0.59\nscore = 1.0", flags=re.MULTILINE)
<_sre.SRE_Match object; span=(0, 12), match='score = 0.65'>
and as I noted in comments, .*
needs re.DOTALL
to match newlines
>>> re.match(r".*^score = \d+\.\d+$.*", "score = 0.65\nscore = 0.59\nscore = 1.0", re.MULTILINE|re.DOTALL)
<_sre.SRE_Match object; span=(0, 37), match='score = 0.65\nscore = 0.59\nscore = 1.0'>
(as noted in Python regex, matching pattern over multiple lines.. why isn't this working? and How do I match any character across multiple lines in a regular expression? of which this could be a duplicate if it wasn't for the raw string bit)
(sorry, my floating point regex is probably a bit weak, you can find better ones around)
Upvotes: 3
Reputation: 520958
You need to match against a non raw string and use DOTALL mode:
print re.match(r".*^score = 0\.59$.*", "score = 0.65\nscore = 0.59\nscore = 1.0",
re.MULTILINE|re.DOTALL)
<_sre.SRE_Match object at 0x7fd2426d0648>
Upvotes: 2