Reputation: 241
I am trying to find things in a string - all of them are before a number, for example:
"Diablo Lord Of Destruction 9.2"
This is an index from a file such that file[2] = "Diablo Lord Of Destruction 9.2"
how can I write code that will select only the text and leave out the numbers and any white space before those numbers (as below)?
"Diablo Lord Of Destruction"
I understand you can easily do this by doing something like this:
contents = file[2]
print contents[0:-2]
Since the values will be changing, I need a more robust solution that can handle different sized numbers and different lengths of white space.
Upvotes: 2
Views: 196
Reputation: 12243
If you'll always have a space before the number, you can split the string. For example:
contents = file[2].split() # Gives a list split by whitespace
contents.pop() # Dump the number
finalStr = ' '.join(contents)
From running a test:
>>> test = "Diablo Lord Of Destruction 9.2"
>>> contents = test.split()
>>> contents
['Diablo', 'Lord', 'Of', 'Destruction', '9.2']
>>> contents.pop()
'9.2'
>>> finalStr = ' '.join(contents)
>>> finalStr
'Diablo Lord Of Destruction
Upvotes: 3
Reputation: 414207
To get all text until the first number is encountered:
import re
s = "Diablo Lord Of Destruction 9.2"
print(re.match(r'\D*', s).group(0))
Upvotes: 1
Reputation: 8767
You can utilize regular expressions and the sub() method:
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; if it is a string, any backslash escapes in it are processed.
>>> import re
>>> re.sub('[0-9.]*', '', 'Diablo Lord of Destruction 9.2')[:-1]
'Diablo Lord of Destruction'
>>> re.sub('[\d.]*', '', 'Diablo Lord of Destruction 9.2')[:-1]
'Diablo Lord of Destruction'
The code above will find all number occurrences, [0-9.] or [\d.], and replace them with ''. In addition, it trims the last character, which was a space.
Upvotes: 3
Reputation: 7864
This is a perfect job for regular expressions. Specifically, you can use the following code to extract all of the text that precedes a number:
import re
s = "Diablo Lord Of Destruction 9.2"
print 'Text: ', re.match('([^0-9]+)',s).group(1)
Regular expressions are a bit of a pain to master but well worth the effort.
Upvotes: 3
Reputation: 9260
How about...
filter(lambda ch: not ch.isdigit(), "Diablo Lord Of Destruction 9.2")
Upvotes: 2
Reputation: 11173
This removes any digits and full stops from your string:
import re
>>> filtered = re.sub('[0-9.]*','',"Diablo Lord Of Destruction 9.2 111" )
>>> filtered
'Diablo Lord Of Destruction '
>>> filtered.strip() # you might want to get rid of the trailing space too!
'Diablo Lord Of Destruction'
Upvotes: 7