Reputation: 694
I am trying to use regex to do the following in a string :
-
between two alphabets, we have to remove it:
A-BA
should be ABA
; and A-B-BAB
should be ABBAB
-
symbol between them:
9AHYA7
should be 9-AHYA-7
; and 977AB99T5
should be 977-AB-99-T-5
These patterns are just simple examples. The string could be more complicated like this :
HS98743YVJUHGF78BF8HH3JHFC83438VUN5498FCNG
7267-VHSBVH8737HHC8C-HYHFWYFHH-7Y84743YR8437G
In the above strings the same principles have to be incorporated.
I tried the following code to convert 8T
into 8-T
re.sub(r'\dab-d', '\d-ab-d', s)
Unfortunately it does not work. I am not sure how to do it.
Upvotes: 2
Views: 1758
Reputation: 163362
You might use 2 capturing groups with lookarounds and in the replacement use a lambda to check which group matched.
If group 1 matched, remove the last character. If group 2 matched, append a hyphen.
([A-Z]-(?=[A-Z]))|([A-Z](?=[0-9])|[0-9](?=[A-Z]))
Explanation
(
Capture group 1
[A-Z]-(?=[A-Z])
Match A-Z and - and assert what is on the right is A-Z)
Close group|
Or(
Capture group 2
[A-Z](?=[0-9])
Match A-Z and assert what is on the right is a digit|
Or[0-9](?=[A-Z])
Match 0-9 and assert what is on the right is A-Z)
Close groupExample code
import re
pattern = r"([A-Z]-(?=[A-Z]))|([A-Z](?=[0-9])|[0-9](?=[A-Z]))"
strings = [
"A-BA",
"A-B-BAB",
"9AHYA7",
"977AB99T5",
"HS98743YVJUHGF78BF8HH3JHFC83438VUN5498FCNG",
"7267-VHSBVH8737HHC8C-HYHFWYFHH-7Y84743YR8437G"
]
for str in strings:
result = re.sub(
pattern,
lambda x: x.group(1)[:-1] if x.group(1) else x.group(2) + "-",
str
)
print(result)
Output
ABA
ABBAB
9-AHYA-7
977-AB-99-T-5
HS-98743-YVJUHGF-78-BF-8-HH-3-JHFC-83438-VUN-5498-FCNG
7267-VHSBVH-8737-HHC-8-CHYHFWYFHH-7-Y-84743-YR-8437-G
Upvotes: 1
Reputation: 521419
If you want to use re.sub
, then here is one way, using capture groups:
inp = "8T-ENI-A2"
output = re.sub(r'^(.)(.)-([^-]+)-(.)(.)$', '\\1-\\2\\3\\4-\\5', inp)
print(output)
This prints:
8-TENIA-2
Upvotes: 1