Reputation: 71
My previous example was not clear, I give another example :
a = '123 - 48 <!-- 456 - 251 - --> 452 - 348'
And if i do something like :
[el for el in re.split(r' - ',a)]
I catch :
['123', '48 <!-- 456', '251', '--> 452', '348']
But I want this :
['123', '48 <!-- 456 - 251 - --> 452', '348']
Thanks...
Upvotes: 1
Views: 498
Reputation: 24740
The result you posted is of re.findall('(\d+)',a);
re.findall('(?:\<\!--.+\d+.+--\>)|(\d+)',a)
['123', '48', '', '452', '348']
filter(None, re.findall('(?:\<\!--.+\d+.+--\>)|(\d+)',a))
['123', '48', '452', '348']
Upvotes: -1
Reputation: 33928
If you want one regex you could use something like:
(\d+)(?!(?:[^<]+|<(?!!--))*-->)
As long as there are no "invalid" -->
.
It matches numbers not followed by -->
, without <!--
in between.
Upvotes: 0
Reputation: 20424
First remove the comments using something like this:
re.sub("<!--.*?-->", "", your_string)
then use your regex to extract numbers.
You can also use ?!...
(negative lookahead assertion) but that won't be so simple.
Upvotes: 5