rlantz
rlantz

Reputation: 75

python regex search pattern

I'm searching a block of text for a newline followed by a period.

pat = '\n\.'
block = 'Some stuff here. And perhaps another sentence here.\n.Some more text.'

For some reason when I use regex to search for my pattern it changes the value of pat (using Python 2.7).

import re
mysrch = re.search(pat, block)

Now the value of pat has been changed to:

'\n\\.'

Which is messing with the next search that I use pat for. Why is this happening, and how can I avoid it?

Thanks very much in advance in advance.

Upvotes: 0

Views: 170

Answers (1)

katharos
katharos

Reputation: 41

The extra slash isn't actually part of the string - the string itself hasn't changed at all.

Here's an example:

>>> pat = '\n\.'
>>> pat
'\n\\.'
>>> print pat

\.

As you can see, when you print pat, it's only got one \ in it. When you dump the value of a string it uses the __repr__ function which is designed to show you unambiguously what is in the string, so it shows you the escaped version of characters. Like \n is the escaped version of a newline, \\ is the escaped version of \.

Your regex is probably not matching how you expect because it has an actual newline character in it, not the literal string "\n" (as a repr: "\\n").

You should either make your regex a raw string (as suggested in the comments).

>>> pat = r"\n\."
>>> pat
'\\n\\.'
>>> print pat
\n\.

Or you could just escape the slashes and use

pat = "\\n\\."

Upvotes: 1

Related Questions