Skorpeo
Skorpeo

Reputation: 2562

String Formatting/Template/Regular Expressions

I have a string format let's say where A = alphanumeric and N = Integer so the template is "AAAAAA-NNNN" now the user sometimes will ommit the dash, and sometimes the "NNNN" is only three digits in which case I need it to pad a 0. The first digit of "NNNN" has to be 0, thus if it is a number is is the last digit of the "AAAAAA" as opposed to the first digit of "NNNN". So in essence if I have the following inputs I want the following results:

Sample Inputs:

"SAMPLE0001"
"SAMPL1-0002"
"SAMPL3003"
"SAMPLE-004"

Desired Outputs:

"SAMPLE-0001"
"SAMPL1-0002"
"SAMPL3-0003"
"SAMPLE-0004"

I know how to check for this using regular expressions but essentially I want to do the opposite. I was wondering if there is a easy way to do this other than doing a nested conditional checking for all these variations. I am using python and pandas but either will suffice.

The regex pattern would be:

"[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]-\d\d\d\d"

or in abbreviated form:

"[a-zA-Z0-9]{6}-[\d]{4}"

Upvotes: 2

Views: 132

Answers (2)

Shashank
Shashank

Reputation: 13869

An alternative solution, it uses str.join:

import re
inputs = ['SAMPLE0001', 'SAMPL1-0002', 'SAMPL3003','SAMPLE-004']
outputs = []
for input_ in inputs:
    m = re.match(r'(\w{6})-?\d?(\d{3})', input_)
    outputs.append('-0'.join(m.groups()))
print(outputs)
# ['SAMPLE-0001', 'SAMPL1-0002', 'SAMPL3-0003', 'SAMPLE-0004']

We are matching the regex (\w{6})-?\d?(\d{3}) against the input strings and joining the captured groups with the string '-0'. This is very simple and fast.

Let me know if you need a more in-depth explanation of the regex itself.

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174796

It would be possible through two re.sub functions.

>>> import re
>>> s = '''SAMPLE0001
SAMPL1-0002
SAMPL3003
SAMPLE-004'''
>>> print(re.sub(r'(?m)(?<=-)(?=\d{3}$)', '0', re.sub(r'(?m)(?<=^[A-Z\d]{6})(?!-)', '-', s)))
SAMPLE-0001
SAMPL1-0002
SAMPL3-0003
SAMPLE-0004

Explanation:

  • re.sub(r'(?m)(?<=^[A-Z\d]{6})(?!-)', '-', s) would be processed at first. It just places a hyphen after the 6th character from the beginning only if the following character is not a hyphen.

  • re.sub(r'(?m)(?<=-)(?=\d{3}$)', '0', re.sub(r'(?m)(?<=^[A-Z\d]{6})(?!-)', '-', s)) By taking the above command's output as input, this would add a digit 0 after to the hyphen and the characters following must be exactly 3.

Upvotes: 2

Related Questions