Reputation: 33233
So, I am working with a text file on which I am doing the following operations on the string
def string_operations(string):
1) lowercase
2) remove integers from string
3) remove symbols
4) stemming
After this, I am still left with strings like:
durham 28x23
I see the flaw in my approach but would like to know if there is a good, fast way to identify if there is a numeric value attached with the string.
So in the above example, I want the output to be
durham
Another example:
21st ammendment
Should give:
ammendment
So how do I deal with this stuff?
Upvotes: 0
Views: 130
Reputation: 311596
If you requirement is, "remove any terms that start with a digit", you could do something like this:
def removeNumerics(s):
return ' '.join([term for term in s.split() if not term[0].isdigit()])
This splits the string on whitespace and then joins with a space all the terms that do not start with a number.
And it works like this:
>>> removeNumerics('21st amendment')
'amendment'
>>> removeNumerics('durham 28x23')
'durham'
If this isn't what you're looking for, maybe show some explicit examples in your questions (showing both the initial string and your desired result).
Upvotes: 5