Arun Kumar
Arun Kumar

Reputation: 694

Python regex how to insert hyphen symbol between an alphabet and a digit; and also remove hyphen in between two alphabets

I am trying to use regex to do the following in a string :

These patterns are just simple examples. The string could be more complicated like this :

In the above strings the same principles have to be incorporated.

I tried the following code to convert 8T into 8-T

    re.sub(r'\dab-d', '\d-ab-d', s)

Unfortunately it does not work. I am not sure how to do it.

Upvotes: 2

Views: 1758

Answers (2)

The fourth bird
The fourth bird

Reputation: 163362

You might use 2 capturing groups with lookarounds and in the replacement use a lambda to check which group matched.

If group 1 matched, remove the last character. If group 2 matched, append a hyphen.

([A-Z]-(?=[A-Z]))|([A-Z](?=[0-9])|[0-9](?=[A-Z]))

Explanation

  • ( Capture group 1
    • [A-Z]-(?=[A-Z]) Match A-Z and - and assert what is on the right is A-Z
  • ) Close group
  • | Or
  • ( Capture group 2
    • [A-Z](?=[0-9]) Match A-Z and assert what is on the right is a digit
    • | Or
    • [0-9](?=[A-Z]) Match 0-9 and assert what is on the right is A-Z
  • ) Close group

Regex demo | Python demo

Example code

import re

pattern = r"([A-Z]-(?=[A-Z]))|([A-Z](?=[0-9])|[0-9](?=[A-Z]))"
strings = [
    "A-BA",
    "A-B-BAB",
    "9AHYA7",
    "977AB99T5",
    "HS98743YVJUHGF78BF8HH3JHFC83438VUN5498FCNG",
    "7267-VHSBVH8737HHC8C-HYHFWYFHH-7Y84743YR8437G"
]

for str in strings:
    result = re.sub(
        pattern,
        lambda x: x.group(1)[:-1] if x.group(1) else x.group(2) + "-",
        str
    )
    print(result)

Output

ABA
ABBAB
9-AHYA-7
977-AB-99-T-5
HS-98743-YVJUHGF-78-BF-8-HH-3-JHFC-83438-VUN-5498-FCNG
7267-VHSBVH-8737-HHC-8-CHYHFWYFHH-7-Y-84743-YR-8437-G

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521419

If you want to use re.sub, then here is one way, using capture groups:

inp = "8T-ENI-A2"
output = re.sub(r'^(.)(.)-([^-]+)-(.)(.)$', '\\1-\\2\\3\\4-\\5', inp)
print(output)

This prints:

8-TENIA-2

Upvotes: 1

Related Questions