Reputation: 4419
I'm trying to dump data from a SQL export file with regular expression. To match the field of post content, I use '(?P<content>.*?)
'. It works fine most of the time, but if the field contains the string of '\n' the regular expression wouldn't match. How can I modify the regular expression to match them? Thanks!
Example(I'm using Python):
>>> re.findall("'(?P<content>.*?)'","'<p>something, something else</p>'")
['<p>something, something else</p>']
>>> re.findall("'(?P<content>.*?)'","'<p>something, \n something else</p>'")
[]
P.S. Seemingly all strings with '\' in the front are treated as escape characters. How can I tell regx to treat them as they are?
Upvotes: 15
Views: 23602
Reputation: 92976
You need the Dotall modifier, to make the dot also match newline characters.
re.S
re.DOTALL
Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.
See it here on docs.python.org
Upvotes: 4
Reputation: 27233
You should use DOTALL
option:
>>> re.findall("'(?P<content>.*?)'","'<p>something, \n something else</p>'", re.DOTALL)
['<p>something, \n something else</p>']
See this.
Upvotes: 34