Xin
Xin

Reputation: 575

using python re to remove unit after a numeric

By using python regular expression, how to remove the unit word after a numeric?

e.g.

units = ['in', 'ft']
'12in desk' becomes '12 desk'
'12 in desk' becomes  '12 desk'
'abc 20 ft long' becomes 'abc 20 long'

Upvotes: 2

Views: 1931

Answers (3)

Bangi
Bangi

Reputation: 113

With the below code, you can remove the unit after number. This is an alternate to @ wesanyer.

import re
units = '|'.join(['in','ft'])
pattern = "[0-9]+"+".*"+units
a = "12in desk"
match = re.search(pattern, "12in desk")
if match:
    a.replace(match.group(1), "")

Upvotes: 0

wesanyer
wesanyer

Reputation: 1012

Here is another way, similar to @Rob 's answer, just a bit different. The difference in my approach is that rather than using the re.sub method, I simply capture all the relevant groups and then put the string back together, omitting the 3rd group which contains the offending text.

import re

units = '|'.join(['in', 'ft'])

vals = ['12in desk', '12 in desk', 'abc 20 ft long']

pattern = r'([^\d]*)(\d+)\s?({})(.*)'.format(units)

regex = re.compile(pattern)
for val in vals:
    match = regex.match(val)
    out = ''.join(match.group(1,2,4))
    print("{} becomes in {}".format(val, out))

Upvotes: 1

Robᵩ
Robᵩ

Reputation: 168726

Here is one way, programmatically constructing the regular expression from the units list:

import re

units = ['in', 'ft']
tests = ['12in desk', '12 in desk', 'abc 20 ft long', ]
expecteds = ['12 desk', '12 desk', 'abc 20 long', ]

regexp = re.compile(r'(\d+)\s*(%s)\b' % '|'.join(units))
for test, expected in zip(tests, expecteds):
    actual = re.sub(regexp, r'\1', test)
    assert actual == expected

Upvotes: 3

Related Questions