re.sub() overrides my string

Question

I'm facing a strange behavior with the re.sub() function in Python.

In a string, I want to replace all occurences like

- list 1
- list 2

with HTML code like

list 1
list 2

So I use

text = re.sub('(- (?P.))', '\g', text)

It works and returns

list 1
list 2

Then I add + in the regex to match the whole sentence (ie "list 1", "list 2")

text = re.sub('(- (?P.+))', '\g', text)

And suprisingly, it returns

ist 1
ist 2

The text after \g is overriding the left part of the string.

If I try

\gfoo instead it returns foot 1

Did you guys already faced this behaviour ? Is there something I'm missing here ?

Thanks

Robᵩ · Accepted Answer

Your input file has carriage returns (' ') at the end of the lines. So, the first input line is like:

 - list 1

Since moves the cursor to the beginning of the current line, and moves to the beginning of the next line, you can print that string and not notice.

After the substitution, your line looks like:

list 1

This causes the to appear at the beginning of the current line when printed.

You have a couple of possible solutions:

An example of the first would be to open the text file with open(fname, 'rU').

An example of the second would be re.sub('(- (?P[^ ]+))', '

\g

', text)

Answers (1)