frazman
frazman

Reputation: 33233

Removing numbers from strings

So, I am working with a text file on which I am doing the following operations on the string

     def string_operations(string):

        1) lowercase
        2) remove integers from string
        3) remove symbols
        4) stemming

After this, I am still left with strings like:

  durham 28x23

I see the flaw in my approach but would like to know if there is a good, fast way to identify if there is a numeric value attached with the string.

So in the above example, I want the output to be

  durham

Another example:

 21st ammendment

Should give:

ammendment

So how do I deal with this stuff?

Upvotes: 0

Views: 130

Answers (1)

larsks
larsks

Reputation: 311596

If you requirement is, "remove any terms that start with a digit", you could do something like this:

def removeNumerics(s):
  return ' '.join([term for term in s.split() if not term[0].isdigit()])

This splits the string on whitespace and then joins with a space all the terms that do not start with a number.

And it works like this:

>>> removeNumerics('21st amendment')
'amendment'
>>> removeNumerics('durham 28x23')
'durham'

If this isn't what you're looking for, maybe show some explicit examples in your questions (showing both the initial string and your desired result).

Upvotes: 5

Related Questions