Chris Bunch
Chris Bunch

Reputation: 89823

Python - Use a Regex to Filter Data

Is there a simple way to remove all characters from a given string that match a given regular expression? I know in Ruby I can use gsub:

>> key = "cd baz ; ls -l"
=> "cd baz ; ls -l"
>> newkey = key.gsub(/[^\w\d]/, "")
=> "cdbazlsl"

What would the equivalent function be in Python?

Upvotes: 5

Views: 17252

Answers (5)

Alexander Borochkin
Alexander Borochkin

Reputation: 4621

May be the shortest way:

In [32]: pattern='[-0-9.]'
   ....: price_str="¥-607.6B"
   ....: ''.join(re.findall(pattern,price_str))
Out[32]: '-607.6'

Upvotes: 0

Alex Martelli
Alex Martelli

Reputation: 881605

The answers so far have focused on doing the same thing as your Ruby code, which is exactly the reverse of what you're asking in the English part of your question: the code removes character that DO match, while your text asks for

a simple way to remove all characters from a given string that fail to match

For example, suppose your RE's pattern was r'\d{2,}', "two or more digits" -- so the non-matching parts would be all non-digits plus all single, isolated digits. Removing the NON-matching parts, as your text requires, is also easy:

>>> import re
>>> there = re.compile(r'\d{2,}')
>>> ''.join(there.findall('123foo7bah45xx9za678'))
'12345678'

Edit: OK, OP's clarified the question now (he did indeed mean what his code, not his text, said, and now the text is right too;-) but I'm leaving the answer in for completeness (the other answers suggesting re.sub are correct for the question as it now stands). I realize you probably mean what you "say" in your Ruby code, and not what you say in your English text, but, just in case, I thought I'd better complete the set of answers!-)

Upvotes: 6

hughdbrown
hughdbrown

Reputation: 49013

re.subn() is your friend:

>>> import re
>>> key = "cd baz ; ls -l"
>>> re.subn(r'\W', "", key)
('cdbazlsl', 6)
>>> re.subn(r'\W', "", key)[0]
'cdbazlsl'

Returns a tuple. Take the first element if you only want the resulting string. Or just call re.sub(), as SilentGhost notes. (Which is to say, his answer is more exact.)

Upvotes: 2

Jochen Ritzel
Jochen Ritzel

Reputation: 107608

import re
old = "cd baz ; ls -l"
regex = r"[^\w\d]" # which is the same as \W btw
pat = re.compile( regex )
new = pat.sub('', old )

Upvotes: 2

SilentGhost
SilentGhost

Reputation: 319571

import re
re.sub(pattern, '', s)

Docs

Upvotes: 14

Related Questions