demo.b
demo.b

Reputation: 3449

Wrong output format with re.sub

I expect this code to print "ez_setup.py" instead it printed "ez_setup\x01". Please can someone point me in the right direction.

In [7]: url = 'http://sourceforge.net/p/mysql-python/mysqldb-2/ci/default/tree/ez_setup.py?format=raw'
In [8]: url_split = url.split('/')
In [9]: for item in url_split:
   ...:     if ".py" in item:
   ...:         file_name = re.sub(r"(.py).+", "\1", item)

In [10]: file_name
Out[10]: 'ez_setup\x01'

Upvotes: 1

Views: 122

Answers (1)

user2555451
user2555451

Reputation:

You need to use a raw-string for \1:

file_name = re.sub(r"(.py).+", r"\1", item)
#                              ^

Otherwise, it will be interpreted as an escape sequence:

>>> '\1'
'\x01'
>>> r'\1'
'\\1'
>>>

Note too that . is a special character in Regex patterns. It tells Python to match any character (except a newline). I think you meant to escape it before py:

file_name = re.sub(r"(\.py).+", r"\1", item)

Now Python will match a literal period.

Upvotes: 1

Related Questions