Reputation: 484
I'm trying to clean some text by removing alphanumeric characters from the end of the string, but I'm also removing normal words as shown on the output. Can someone help me achieve the expected result?
re.sub(r'[a-zA-Z0-9/]{5,}$', '', text)
asus zenfone 3s max zc521tl
asus zenfone max plus (m1) zb570tl
asus zenfone max pro (m1) zb601kl/zb602k
nokia 3.1 c
nokia 3
asus zenfone 3 zoom ze553k
asus zenfone 3 deluxe zs570kl
blackberry keyone
htc explorer
lg tribute
acer liquid z520
Output:
asus zenfone 3s max
asus zenfone max plus (m1)
asus zenfone max pro (m1)
nokia 3.1 c
nokia 3
asus zenfone 3 zoom
asus zenfone 3 deluxe
blackberry
htc
lg
acer liquid z520
Expected output:
asus zenfone 3s max
asus zenfone max plus (m1)
asus zenfone max pro (m1)
nokia 3.1 c
nokia 3
asus zenfone 3 zoom
asus zenfone 3 deluxe
**blackberry keyone**
**htc explorer**
**lg tribute**
acer liquid z520
Upvotes: 1
Views: 1324
Reputation: 163362
If it should be the last word in a string and there are always multiple words, you might use:
[ \t]+(?=[a-zA-Z0-9/]{5})[a-zA-Z/]*[0-9][a-zA-Z0-9/]*[A-Za-z]$
[ \t]+
Match 1+ spaces or tabs(?=[a-zA-Z0-9/]{5})
Assert at least 5 chars of any of the listed[a-zA-Z/]*
Match 0+ times any of the listed[0-9]
Match a digit[a-zA-Z0-9/]*
Match 0+ times any of the listed in the character class[A-Za-z]
Match a char a-zA-Z$
End of stringIn the replacement use an empty string.
Upvotes: 1
Reputation: 3893
You can add a positive look-ahead to the regex that requires the word at the end to contain at least one digit for it to be removed: (?=\D*\d)
. That will prevent it from removing normal words that don't contain numbers.
The complete program:
#!/usr/bin/env python3
import re
texts = [
'asus zenfone 3s max zc521tl',
'asus zenfone max plus (m1) zb570tl',
'asus zenfone max pro (m1) zb601kl/zb602k',
'nokia 3.1 c',
'nokia 3',
'asus zenfone 3 zoom ze553k',
'asus zenfone 3 deluxe zs570kl',
'blackberry keyone',
'htc explorer',
'lg tribute',
'acer liquid z520',
]
for text in texts:
print(re.sub(r'(?=\D*\d)[a-zA-Z0-9/]{5,}$', '', text))
It outputs:
asus zenfone 3s max
asus zenfone max plus (m1)
asus zenfone max pro (m1)
nokia 3.1 c
nokia 3
asus zenfone 3 zoom
asus zenfone 3 deluxe
blackberry keyone
htc explorer
lg tribute
acer liquid z520
Upvotes: 1