Reputation: 2562
I have a string format let's say where A = alphanumeric and N = Integer so the template is "AAAAAA-NNNN" now the user sometimes will ommit the dash, and sometimes the "NNNN" is only three digits in which case I need it to pad a 0. The first digit of "NNNN" has to be 0, thus if it is a number is is the last digit of the "AAAAAA" as opposed to the first digit of "NNNN". So in essence if I have the following inputs I want the following results:
Sample Inputs:
"SAMPLE0001"
"SAMPL1-0002"
"SAMPL3003"
"SAMPLE-004"
Desired Outputs:
"SAMPLE-0001"
"SAMPL1-0002"
"SAMPL3-0003"
"SAMPLE-0004"
I know how to check for this using regular expressions but essentially I want to do the opposite. I was wondering if there is a easy way to do this other than doing a nested conditional checking for all these variations. I am using python and pandas but either will suffice.
The regex pattern would be:
"[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]-\d\d\d\d"
or in abbreviated form:
"[a-zA-Z0-9]{6}-[\d]{4}"
Upvotes: 2
Views: 132
Reputation: 13869
An alternative solution, it uses str.join:
import re
inputs = ['SAMPLE0001', 'SAMPL1-0002', 'SAMPL3003','SAMPLE-004']
outputs = []
for input_ in inputs:
m = re.match(r'(\w{6})-?\d?(\d{3})', input_)
outputs.append('-0'.join(m.groups()))
print(outputs)
# ['SAMPLE-0001', 'SAMPL1-0002', 'SAMPL3-0003', 'SAMPLE-0004']
We are matching the regex (\w{6})-?\d?(\d{3})
against the input strings and joining the captured groups with the string '-0'
. This is very simple and fast.
Let me know if you need a more in-depth explanation of the regex itself.
Upvotes: 1
Reputation: 174796
It would be possible through two re.sub
functions.
>>> import re
>>> s = '''SAMPLE0001
SAMPL1-0002
SAMPL3003
SAMPLE-004'''
>>> print(re.sub(r'(?m)(?<=-)(?=\d{3}$)', '0', re.sub(r'(?m)(?<=^[A-Z\d]{6})(?!-)', '-', s)))
SAMPLE-0001
SAMPL1-0002
SAMPL3-0003
SAMPLE-0004
Explanation:
re.sub(r'(?m)(?<=^[A-Z\d]{6})(?!-)', '-', s)
would be processed at first. It just places a hyphen after the 6th character from the beginning only if the following character is not a hyphen.
re.sub(r'(?m)(?<=-)(?=\d{3}$)', '0', re.sub(r'(?m)(?<=^[A-Z\d]{6})(?!-)', '-', s))
By taking the above command's output as input, this would add a digit 0
after to the hyphen and the characters following must be exactly 3.
Upvotes: 2