Reputation: 365
I'm facing a strange behavior with the re.sub()
function in Python.
In a string, I want to replace all occurences like
- list 1
- list 2
with HTML code like
<li>list 1</li>
<li>list 2</li>
So I use
text = re.sub('(- (?P<id>.))', '<li>\g<id></li>', text)
It works and returns
<li>l</li>ist 1
<li>l</li>ist 2
Then I add +
in the regex to match the whole sentence (ie "list 1", "list 2")
text = re.sub('(- (?P<id>.+))', '<li>\g<id></li>', text)
And suprisingly, it returns
</li>ist 1
</li>ist 2
The text after \g<id>
is overriding the left part of the string.
If I try <li>\g<id>foo
instead it returns foot 1
Did you guys already faced this behaviour ? Is there something I'm missing here ?
Thanks
Upvotes: 0
Views: 98
Reputation: 168836
Your input file has carriage returns ('\r'
) at the end of the lines. So, the first input line is like:
- list 1\r\n
Since \r
moves the cursor to the beginning of the current line, and \n
moves to the beginning of the next line, you can print
that string and not notice.
After the substitution, your line looks like:
<li>list 1\r</li>\n
This causes the </li>
to appear at the beginning of the current line when printed.
You have a couple of possible solutions:
\r
on input\r
from the character class that you matchAn example of the first would be to open the text file with open(fname, 'rU')
.
An example of the second would be re.sub('(- (?P<id>[^\r\n]+))', '<li>\g<id></li>', text)
Upvotes: 2