Reputation: 2983
I want to be able to remove numbers with decimal places from within a string of text using regex. See here
import re
obj = '''This is my #1 [email protected] <body/> 2 3 4 5 2345! 23542 312453 76666374 56s34534
1. [email protected]
1978-12-01 12:00:00 1.23 21.243
<script>function stripScripts(s) {
var div = document.createElement('div');
div.innerHTML = s;
var scripts = div.getElementsByTagName('script');
var i = scripts.length;
while (i--) {
scripts[i].parentNode.removeChild(scripts[i]);
}
return div.innerHTML;
}</script> 99.258 245.643.3456!'''
regex1 = re.compile('(?is)(<script[^>]*>)(.*?)(</script>)|(<.*?>)|(?<!\S)\d+(?!\S)')
out1 = re.sub(regex1, ' ', obj)
print out1
data = ' '.join(out1.split()).strip()
print data
This regex removes most of what I need it to but leaves 1.23, 21.243 and 99.258. I would like to append this current regex to remove those values as well...
regex = (?is)(<script[^>]*>)(.*?)(</script>)|(<.*?>)|(?<!\S)\d+(?!\S)
Upvotes: 1
Views: 2392
Reputation: 2983
Thanks @Joran Beasley! I tried this and it worked.
(?is)(<script[^>]*>)(.*?)(</script>)|(<.*?>)|(?<!\S)\d+(?!\S)|([0-9]+\.[0-9]+ )
What is the advantage of adding the first "d" here?
(\d+\.[0-9 ]+)
Upvotes: 0
Reputation: 114038
re.sub("\d*\.\d+","",the_text)
wouldnt work? or maybe
re.sub("(\d*\.\d+)|(\d+\.[0-9 ]+)","",the_text)
Upvotes: 2