Reputation: 1041
I have a bunch of strings that I need to clean, and it has the following patterns:
12345SNET1
1234567SNETA2
123456SNET3
The headache is that, anything after SNET could be any integer from 0 to 9, and it could also be a char from A-Z plus an integer from 0 to 9.
Is there anyway to use regex to detect if the string has this pattern so I could use:
if regex detect (returns True):
str = str[:-1]
elif regex detect (returns True):
str = str[:-2]
Upvotes: 0
Views: 115
Reputation: 54168
You can use re.fullmatch
for checking (return True
is the given string matches totaly the regex) with basic regex like .*SNET\d
and .*SNET[A-Z]\d
, also don't use str
as variable name, it's a built-in word
if re.fullmatch(r".*SNET\d", value):
value = value[:-1]
if re.fullmatch(r".*SNET[A-Z]\d", value):
value = value[:-2]
You can directly use re.sub
to replace the end
value = re.sub(r"(?<=SNET)[A-Z]?\d", "", value)
For use, you can export this in a method
def clean(value):
if re.fullmatch(r".*SNET\d", value):
return value[:-1]
if re.fullmatch(r".*SNET[A-Z]\d", value):
return value[:-2]
return value
# OR
def clean(value):
return re.sub(r"(?<=SNET)[A-Z]?\d", "", value)
if __name__ == '__main__':
values = ["12345SNET1", "1234567SNETA2", "123456SNET3"]
print(values) # ['12345SNET1', '1234567SNETA2', '123456SNET3']
values = list(map(clean, values))
print(values) # ['12345SNET', '1234567SNET', '123456SNET']
Upvotes: 2
Reputation: 13232
You don't need to have two cases if you use the right regular expression.
values = ["12345SNET1", "1234567SNETA2", "123456SNET3"]
for value in values:
m = re.match(r'\d+SNET([A-Z]?\d)', value)
if m:
print(m.group(1))
This will print
1
A2
3
If you want the text before the last character(s) you can add extra parentheses in the regular expression to catch that part:
values = ["12345SNET1", "1234567SNETA2", "123456SNET3"]
for value in values:
m = re.match(r'(\d+SNET)([A-Z]?\d)', value)
if m:
print(m.group(1))
Result
12345SNET
1234567SNET
123456SNET
Upvotes: 1
Reputation: 36700
You might use re.sub
combined with positive lookbehind to jettison unwanted characters following way:
import re
s1 = "12345SNET1"
s2 = "1234567SNETA2"
s3 = "123456SNET3"
out1 = re.sub(r"(?<=SNET)[A-Z]?\d", "", s1)
out2 = re.sub(r"(?<=SNET)[A-Z]?\d", "", s2)
out3 = re.sub(r"(?<=SNET)[A-Z]?\d", "", s3)
print(out1) # 12345SNET
print(out2) # 1234567SNET
print(out3) # 123456SNET
Upvotes: 2