Reputation: 12462

Python string literal to regex object

I have a function returning a string "r'^A Plat'" which is written into a text file

get_Pat(file)
    #process text file and now returns "r'^A Plat'"

originally, I had it hard coded inside the code.

pat = r'^A Plat'
use(pat)

now

pat = get_Pat(file)
use(pat)

But its complaining because i suppose its string instead of regex object.

I have tried

re.escape(get_Pat(file))

and

re.compile(get_Pat(file))

but none of them works

How do i convert string literal into regex object?

Is r'^A Plat' a equivalent of simply re.compile("A Plat")?? dumb question, maybe

it would work if its use("^A Plat'")
Doesnt work if its use("r'^A Plat'") <--- what get_Pat(file) is spitting out

I suppose my task is simply tranforming string r'^A Plat' in to ^A Plat.
But i feel like its just a cheap hack.

Upvotes: 3

Answers (4)

eyquem

Reputation: 27585

from ast import literal_eval
pat = literal_eval(get_Pat(file))

EDIT

aelon,

As you wrote in a comment you can't import literal_eval(), the above solution of mine is useless for you. Besides, though expressing interesting information, the other answers didn't brought another solution.
So, I propose a new one, not using literal_eval().

import re

detect = re.compile("r(['\"])(.*?)\\1[ \t]*$")

with open('your_file.txt') as f:
    pat = f.readline()

if detect.match(pat):
    r = re.compile(detect.match(pat).group(2))
else:
    r = re.compile(pat)

Explanations:

Suppose there is the succession of characters r'^Six o\'clock\nJim' written as first line of *your_file*

The opening and reading of the first line of *your_file* creates an object pat
- its TYPE is <type 'str'> in Python 2 and <class 'str'> in Python 3
- its REPRESENTATION is "r'^Six o\'clock\nJim'"
- its VALUE is r'^Six o\'clock\nJim' , that is to say the succession of characters r , ' , ^ , S , i , x , , o , \ , ' , c , l , o , c , k , \ , n , J , i , m
There may be also the "character" \n at the end if there is a second line in the file. And there may be also blanks or tabs, who knows ?, between the end of r'^Six o\'clock\nJim' written in the file and the end of its line. That's why I close the regex pattern to define detect with [ \t]*$.
So, we may obtain possible additional blanks and tabs and newline after the characters of interest, and then if we do print tuple(pat) we'll obtain for example:

('r', "'", '^', 'S', 'i', 'x', ' ', 'o', '\\', "'", 'c', 'l', 'o', 'c', 'k', '\\', 'n', 'J', 'i', 'm', "'", ' ', ' ', ' ', '\t', '\n')

Now, let us consider the object obtained with the expression detect.match(pat).group(2).
Its value is ^Six o\'clock\nJim , composed of 18 characters, \ and ' and n being three distinct characters among them, there are not one escaped character \' and one escaped character \n in it.
This value is exactly the same as the one we would obtain for an object rawS of name rawSby writing the instruction rawS = r'^Six o\'clock\nJim'
Then, we can obtain the regex whose pattern is written in a file under the form r'....' by writing directly r = re.compile(detect.match(pat).group(2))
In my example, there are only the sequences \' and \n in the series of characters written in the file. But all that precedes is valid for any of the Escape Sequences of the language.

In other words, we don't have to wonder about a function that would do the same as the EXPRESSION r'^Six o\'clock\nJim' from the STRING "r'^Six o\'clock\nJim'" of value r'^Six o\'clock\nJim' ,
we have directly the result of r'^Six o\'clock\nJim' as the value of the string catched by detect.match(pat).group(2).

Nota Bene

In Python 2, the type <type 'str'> is the type of a limited repertoire of characters.
It is the type of the read content of a file, opened as well with mode 'r' as with mode 'rb'.

In Python 3, the type <class 'str'> covers the unicode characters.
But contrary to Python 3, the read content of a file opened with mode 'r' is of type <type 'str'>
while it is of type <class 'bytes'> if the file is opened with mode 'rb'.

Then, I think the above code works as well in Python 3 as in Python 2, so such the file is opened with mode 'r'.

If the file should be opened with 'rb' the regex pattern should be changed to b"r(['\"])(.*?)\\1[ \t]*\r?\n".

AFAIHU

Upvotes: 2

bgporter

Reputation: 36524

Not sure what you mean by 'none of them works', but re.compile() is what you're looking for:

>>> def getPat():
...     return r'^A Plat'
...
...
>>> getPat()
'^A Plat'
>>> reObj = re.compile(getPat())
>>> reObj
<_sre.SRE_Pattern object at 0x16cfa18>
>>> reObj.match("A Plat")
<_sre.SRE_Match object at 0x16c3058>
>>> reObj.match("foo")

edit:

You can get rid of the extra r' ' cruft after it's returned with this code:

>>> s = "r'^A Plat'"
>>> s = s[1:].strip("'")
>>> s
'^A Plat'

Upvotes: 2

Noelkd

Reputation: 7906

According to the comment in your get_pat function its returning:

"r'^A Plat'"

Which is not what you thought you were getting:

>>> x = re.compile("r'^A Plat'")
>>> y = "A Plat wins"
>>> x.findall(y)
[]
>>> x = re.compile("^A Plat")
>>> x.findall(y)
['A Plat']
>>>

So the regex your using isn't r'^A Plat' its "r'^A Plat'", r'^A Plat' is fine:

>>> x = re.compile(r'^A Plat')
>>> x.findall(y)
['A Plat']

To fix this I would have to understand how you where getting the string "r'^A Plat'" in the first place.

Upvotes: 1

John Kugelman

Reputation: 361849

r'^A Plat' is identical to '^A Plat' without the r. The r stands for raw, not regex. It lets you write strings with special characters like \ without having to escape them.

>>> r'^A Plat'
'^A Plat'
>>> r'/ is slash, \ is backslash'
'/ is slash, \\ is backslash'
>>> r'write \t for tab, \n for newline, \" for double quote'
'write \\t for tab, \\n for newline, \\" for double quote'

Raw strings are commonly used when writing regexes since regexes often contain backslashes that would otherwise need to be escaped. r does not create regex objects, though.

From the Python manual:

§ 2.4.1. String literals

String literals may optionally be prefixed with a letter 'r' or 'R'; such strings are called raw strings and use different rules for interpreting backslash escape sequences.

...

Unless an 'r' or 'R' prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C.

Upvotes: 2

Python string literal to regex object

Answers (4)

EDIT

Explanations:

§ 2.4.1. String literals

Related Questions