Reputation: 14684
how to make a python RegEx which removes all characters form a text except the: \d \w \s
I tried something like this:
import re
re.sub(r'\W*\D*\S*', '', 'this is my<\n test <+-,1345.;>')
But this gives me a empty string. I want to get: this is my test ,1234.
. I also would like to keep the ,
and .
Upvotes: 0
Views: 457
Reputation: 1121624
Use a inverted character class:
re.sub(r'[^., \t\w]*', '', 'this is my<\n test <+-,1345.;>')
Demo:
>>> re.sub(r'[^., \t\w]*', '', 'this is my<\n test <+-,1345.;>')
'this is my test ,1345.'
The \W
, \S
and \D
classes are too wide. \D
for example matches most of what \w
matches, removing anything that falls in \D
removes too much. Anything that is not a digit is to be removed, but you wanted to keep letters and punctiation.
I replaced \s
with a literal space and a tab, because you want to remove newlines (also whitespace), and I added .
and ,
so that those are kept too. \d
is also not needed, \w
is a superset of \d
.
Upvotes: 5