Reputation:
If I have a large string with multiple lines and I want to match part of a line only to end of that line, what is the best way to do that?
So, for example I have something like this and I want it to stop matching when it reaches the new line character.
r"(?P<name>[A-Za-z\s.]+)"
I saw this in a previous answer:
$ - indicates matching to the end of the string, or end of a line if multiline is enabled.
My question is then how do you "enable multiline" as the author of that answer states?
Upvotes: 8
Views: 25944
Reputation: 336158
Simply use
r"(?P<name>[A-Za-z\t .]+)"
This will match ASCII letters, spaces, tabs or periods. It'll stop at the first character that's not included in the group - and newlines aren't (whereas they are included in \s
, and because of that it's irrelevant whether multiline mode is turned on or off).
Upvotes: 12
Reputation: 1228
You can enable multiline matching by passing re.MULTILINE
as the second argument to re.compile()
. However, there is a subtlety to watch out for: since the +
quantifier is greedy, this regular expression will match as long a string as possible, so if the next line is made up of letters and whitespace, the regex might match more than one line ($
matches the end of any string).
There are three solutions to this:
\s
) your repeated character set does not match that newline.+?
, the non-greedy ("minimal") version of +
, so that it will match as short a string as possible and therefore stop at the first newline.text.split('\n')
.Upvotes: 2
Reputation: 7419
Look at the flags
parameter at http://docs.python.org/library/re.html#module-contents
Upvotes: 1