Reputation: 25928
I am trying to delete all numbers from a string as long as the number ends in " ","grams","g","kg" or "kilograms".
I am using a regular expression but its not removing any numbers. Whats going wrong?
For example; the string "abc 1231g kjsjk jkdsfkjdkj 11kg"
is should produce "abc kjsjk jkdsfkjdkj "
Python code:
from re import sub
test = "abc 1231g kjsjk jkdsfkjdkj 11kg"
test = sub("[\d]+[\sg|$grams|$kg|$kilograms]$"," ",test)
print test # every number is still there
Upvotes: 1
Views: 1395
Reputation: 8748
\d+\.?\d*
will be needed to account for decimal numbers;
and you will want the order to be grams|g
for it not to leave you with rams.
import re
test = "A test with 1a and 123 and 129kg and 80.5g and 5grams."
test2 = re.sub("\d+\.?\d*(\s|grams|g|kg|kilograms)\s?", "", test)
test2: 'A test with 1a and and and and .'
As is the question could mean that you only want to remove the numbers (leaving the suffix),
and in that case you could use a positive lookahead assertion (?=...)
:
test2 = re.sub("\d+\.?\d*(?=\s|grams|g|kg|kilograms)\s?", "", test)
test2: 'A test with 1a and and kg and g and grams.'
Upvotes: 0
Reputation: 17505
Your regular expression is not capturing what you're looking for. The square brackets []
indicate defining a character class, so [\sg|$...]
isn't what you want. You should try:
test = sub("\d+(\s|g|grams|kg|kilograms)", " ", test)
Here, we start with \d+
for the number, and then use parentheses ()
for grouping and put all the possible suffixes in it, separated by |
.
To get the output you specified, we need to change a few more things. The replacement string should be ""
instead of " "
, and we need to be able to pick up an extra space at the end by appending \s?
to the regex.
test = sub("\d+(\s|g|grams|kg|kilograms)\s?", "", test)
Upvotes: 2
Reputation: 183251
Square brackets [...]
and dollar signs $
do not mean what you think they do. You need:
test = sub("\d+\s(g|grams|kg|kilograms)"," ",test)
What [\sg|$grams|$kg|$kilograms]
means is "a whitespace character (\s
), or any of these characters: g|$grams|$kg|$kilograms
"; so [\sg|$grams|$kg|$kilograms]
is equivalent to [\s$agiklomrs]
, and roughly equivalent to (\s|\$|a|g|i|k|l|o|m|r|s)
.
What $
means is "only match if this is the very end of the string".
Upvotes: 1