Reputation: 575
By using python regular expression, how to remove the unit word after a numeric?
e.g.
units = ['in', 'ft']
'12in desk' becomes '12 desk'
'12 in desk' becomes '12 desk'
'abc 20 ft long' becomes 'abc 20 long'
Upvotes: 2
Views: 1931
Reputation: 113
With the below code, you can remove the unit after number. This is an alternate to @ wesanyer.
import re
units = '|'.join(['in','ft'])
pattern = "[0-9]+"+".*"+units
a = "12in desk"
match = re.search(pattern, "12in desk")
if match:
a.replace(match.group(1), "")
Upvotes: 0
Reputation: 1012
Here is another way, similar to @Rob 's answer, just a bit different. The difference in my approach is that rather than using the re.sub
method, I simply capture all the relevant groups and then put the string back together, omitting the 3rd group which contains the offending text.
import re
units = '|'.join(['in', 'ft'])
vals = ['12in desk', '12 in desk', 'abc 20 ft long']
pattern = r'([^\d]*)(\d+)\s?({})(.*)'.format(units)
regex = re.compile(pattern)
for val in vals:
match = regex.match(val)
out = ''.join(match.group(1,2,4))
print("{} becomes in {}".format(val, out))
Upvotes: 1
Reputation: 168726
Here is one way, programmatically constructing the regular expression from the units
list:
import re
units = ['in', 'ft']
tests = ['12in desk', '12 in desk', 'abc 20 ft long', ]
expecteds = ['12 desk', '12 desk', 'abc 20 long', ]
regexp = re.compile(r'(\d+)\s*(%s)\b' % '|'.join(units))
for test, expected in zip(tests, expecteds):
actual = re.sub(regexp, r'\1', test)
assert actual == expected
Upvotes: 3