Xun Yang
Xun Yang

Reputation: 4419

Regular expression: how to match a string containing "\n" (newline)?

I'm trying to dump data from a SQL export file with regular expression. To match the field of post content, I use '(?P<content>.*?)'. It works fine most of the time, but if the field contains the string of '\n' the regular expression wouldn't match. How can I modify the regular expression to match them? Thanks!

Example(I'm using Python):

>>> re.findall("'(?P<content>.*?)'","'<p>something, something else</p>'")
['<p>something, something else</p>']

>>> re.findall("'(?P<content>.*?)'","'<p>something, \n something else</p>'")
[]

P.S. Seemingly all strings with '\' in the front are treated as escape characters. How can I tell regx to treat them as they are?

Upvotes: 15

Views: 23602

Answers (2)

stema
stema

Reputation: 92976

You need the Dotall modifier, to make the dot also match newline characters.

re.S
re.DOTALL
Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.

See it here on docs.python.org

Upvotes: 4

Adam Zalcman
Adam Zalcman

Reputation: 27233

You should use DOTALL option:

>>> re.findall("'(?P<content>.*?)'","'<p>something, \n something else</p>'", re.DOTALL)
['<p>something, \n something else</p>']

See this.

Upvotes: 34

Related Questions