gurehbgui
gurehbgui

Reputation: 14684

python regex, remove all except \d \s \w

how to make a python RegEx which removes all characters form a text except the: \d \w \s

I tried something like this:

import re
re.sub(r'\W*\D*\S*', '', 'this is my<\n test <+-,1345.;>')

But this gives me a empty string. I want to get: this is my test ,1234.. I also would like to keep the , and .

Upvotes: 0

Views: 457

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121624

Use a inverted character class:

re.sub(r'[^., \t\w]*', '', 'this is my<\n test <+-,1345.;>')

Demo:

>>> re.sub(r'[^., \t\w]*', '', 'this is my<\n test <+-,1345.;>')
'this is my test ,1345.'

The \W, \S and \D classes are too wide. \D for example matches most of what \w matches, removing anything that falls in \D removes too much. Anything that is not a digit is to be removed, but you wanted to keep letters and punctiation.

I replaced \s with a literal space and a tab, because you want to remove newlines (also whitespace), and I added . and , so that those are kept too. \d is also not needed, \w is a superset of \d.

Upvotes: 5

Related Questions