Koffee
Koffee

Reputation: 365

re.sub() overrides my string

I'm facing a strange behavior with the re.sub() function in Python.

In a string, I want to replace all occurences like

- list 1
- list 2

with HTML code like

<li>list 1</li>
<li>list 2</li>

So I use

text = re.sub('(- (?P<id>.))', '<li>\g<id></li>', text)

It works and returns

<li>l</li>ist 1
<li>l</li>ist 2

Then I add + in the regex to match the whole sentence (ie "list 1", "list 2")

text = re.sub('(- (?P<id>.+))', '<li>\g<id></li>', text)

And suprisingly, it returns

</li>ist 1
</li>ist 2

The text after \g<id> is overriding the left part of the string.

If I try <li>\g<id>foo instead it returns foot 1

Did you guys already faced this behaviour ? Is there something I'm missing here ?

Thanks

Upvotes: 0

Views: 98

Answers (1)

Robᵩ
Robᵩ

Reputation: 168836

Your input file has carriage returns ('\r') at the end of the lines. So, the first input line is like:

 - list 1\r\n

Since \r moves the cursor to the beginning of the current line, and \n moves to the beginning of the next line, you can print that string and not notice.

After the substitution, your line looks like:

<li>list 1\r</li>\n

This causes the </li> to appear at the beginning of the current line when printed.

You have a couple of possible solutions:

  • Strip the \r on input
  • Exclude \r from the character class that you match

An example of the first would be to open the text file with open(fname, 'rU').

An example of the second would be re.sub('(- (?P<id>[^\r\n]+))', '<li>\g<id></li>', text)

Upvotes: 2

Related Questions