petobens
petobens

Reputation: 1390

Regex split with multiple lines

The following function which splits a string by the occurrence of a pattern doesn't work when the text inside brackets spans multiple lines:

import re
def header(text):
    authors = [i.strip() for i in re.split(r'\\and|\\thanks\{.*?\}', text, flags=re.M)]
    names = filter(None,authors)
    return '{} and {}'.format(', '.join(names[:-1]), names[-1])

print header(r"""John Bar \and Tom Foo\thanks{Testing if this works with 
multiple lines} \and Sam Baz""")

I don't know if the regex is wrong or if I'm using incorrectly the flag in the splitfunction.

Upvotes: 1

Views: 936

Answers (2)

Jakub M.
Jakub M.

Reputation: 33827

You should use re.DOTALL flag:

re.S
re.DOTALL

Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.

Upvotes: 2

Explosion Pills
Explosion Pills

Reputation: 191749

re.M is for anchors in multi-line strings. What you want is re.S, which makes . match newlines.

Upvotes: 2

Related Questions