Ferdinand
Ferdinand

Reputation: 241

How do I truncate a string at the last letter before the first occurrence of a digit?

I am trying to find things in a string - all of them are before a number, for example:

"Diablo Lord Of Destruction 9.2"

This is an index from a file such that file[2] = "Diablo Lord Of Destruction 9.2"

how can I write code that will select only the text and leave out the numbers and any white space before those numbers (as below)?

"Diablo Lord Of Destruction"

I understand you can easily do this by doing something like this:

contents = file[2]
print contents[0:-2]

Since the values will be changing, I need a more robust solution that can handle different sized numbers and different lengths of white space.

Upvotes: 2

Views: 196

Answers (6)

thegrinner
thegrinner

Reputation: 12243

If you'll always have a space before the number, you can split the string. For example:

contents = file[2].split() # Gives a list split by whitespace
contents.pop() # Dump the number
finalStr = ' '.join(contents)

From running a test:

>>> test = "Diablo Lord Of Destruction 9.2"
>>> contents = test.split()
>>> contents
['Diablo', 'Lord', 'Of', 'Destruction', '9.2']
>>> contents.pop()
'9.2'
>>> finalStr = ' '.join(contents)
>>> finalStr
'Diablo Lord Of Destruction

Upvotes: 3

jfs
jfs

Reputation: 414207

To get all text until the first number is encountered:

import re

s = "Diablo Lord Of Destruction 9.2"
print(re.match(r'\D*', s).group(0))

Upvotes: 1

Robert
Robert

Reputation: 8767

You can utilize regular expressions and the sub() method:

Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; if it is a string, any backslash escapes in it are processed.

>>> import re
>>> re.sub('[0-9.]*', '', 'Diablo Lord of Destruction 9.2')[:-1]
'Diablo Lord of Destruction'
>>> re.sub('[\d.]*', '', 'Diablo Lord of Destruction 9.2')[:-1]
'Diablo Lord of Destruction'

The code above will find all number occurrences, [0-9.] or [\d.], and replace them with ''. In addition, it trims the last character, which was a space.

Upvotes: 3

Rakis
Rakis

Reputation: 7864

This is a perfect job for regular expressions. Specifically, you can use the following code to extract all of the text that precedes a number:

import re
s = "Diablo Lord Of Destruction 9.2"
print 'Text: ', re.match('([^0-9]+)',s).group(1)

Regular expressions are a bit of a pain to master but well worth the effort.

Upvotes: 3

nullpotent
nullpotent

Reputation: 9260

How about...

filter(lambda ch: not ch.isdigit(), "Diablo Lord Of Destruction 9.2")

Upvotes: 2

Maria Zverina
Maria Zverina

Reputation: 11173

This removes any digits and full stops from your string:

import re
>>> filtered = re.sub('[0-9.]*','',"Diablo Lord Of Destruction 9.2  111" )
>>> filtered
'Diablo Lord Of Destruction   '
>>> filtered.strip()           # you might want to get rid of the trailing space too!
'Diablo Lord Of Destruction'

Upvotes: 7

Related Questions