Reputation: 52183
I have a string like this:
Hi. My name is _John_. I am _20_ years old.
and I'd like to convert it into this:
Hi. My name is <b>John</b>. I am <b>20</b> years old.
I did something like this but no luck.
import re
text = "Hi. My name is _John_. I am _20_ years old."
pattern = "(.*)(\_)(.*)(\_)(.*)"
re.sub(pattern, r'\1<b>\3</b>\5', text)
'Hi. My name is _John_. I am <b>20</b> years old.'
What is wrong with the pattern? Why is it not seeing the first bold text?
Any help would be appreciated. Thanks.
Upvotes: 2
Views: 429
Reputation: 73658
Have you tried using String Templates ? They were build for something like this. Simple String substitutions. Hell of a lot cleaner & elegant than using regexes...
import string
new_style = string.Template('Hi. My name is $name. I am $age years old.')
print new_style % {'name':'<b>John</b>', 'age':'<b>20</b>'} #produces what u want.
For more on string template examples check this activeState link
Upvotes: 3
Reputation: 27038
The problem is that your first .*
in the pattern is eating everything to the left of the last possible match. It is therefore said that *
is greedy. Use a non-greedy pattern
pattern='_(.+?)_'
re.sub(pattern, r'<b>\1</b>', text)
?
makes the match non-greedy; as short as possible. + required at east one character between the two underscores in order for it to be replaced with <b>text</b>
. So __
will remain __
If you would like __
to become <b></b>
then use .*?
Upvotes: 3
Reputation: 174662
This sounds remarkably like markdown syntax, so if your goal is to parse that, there already exists a python library.
Upvotes: 1
Reputation: 16115
Change to:
pattern = "_([^_]*)_"
re.sub(pattern, r'<b>\1</b>', text)
Also see this example.
Upvotes: 4
Reputation: 40414
The problem is that *
is greedy and consumes as many characters as possible (including more _
). To fix that, you can use the non-greedy alternative *?
as follows:
>>> pattern = r'_(.*?)_'
>>> replacement = r'<b>\1</b>'
>>> re.sub(pattern ,replacement, text)
'Hi. My name is <b>John</b>. I am <b>20</b> years old.'
Note that re.sub
behaves like re.search
instead of re.match
. That is, you can use a pattern that just partially matches the input (in this case, just some text surrounded by _
) instead of something that matches the whole line.
Upvotes: 4
Reputation: 16252
It's because the pattern is greedy and the first (.*)
matches the text from the beginning all the way to the third _
:
>>> re.match(pattern, text).groups()
('Hi. My name is _John_. I am ', '_', '20', '_', ' years old.')
Here is a simplified, non-greedy version:
>>> re.sub('_(.+?)_', r'<b>\1</b>', text)
'Hi. My name is <b>John</b>. I am <b>20</b> years old.'
Upvotes: 2