pythonintraining
pythonintraining

Reputation: 79

Regular Expressions using Substitution to convert numbers

I'm a Python beginner, so keep in mind my regex skills are level -122.

I need to convert a string with text containing file1 to file01, but not convert file10 to file010.

My program is wrong, but this is the closest I can get, I've tried dozens of combinations but I can't get close:

import re
txt = 'file8, file9, file10'
pat = r"[0-9]"
regexp = re.compile(pat)
print(regexp.sub(r"0\d", txt))

Can someone tell me what's wrong with my pattern and substitution and give me some suggestions?

Upvotes: 1

Views: 129

Answers (3)

Janne Karila
Janne Karila

Reputation: 25197

This approach uses a regex to find every sequence of digits and str.zfill to pad with zeros:

>>> txt = 'file8, file9, file10'
>>> re.sub(r'\d+', lambda m : m.group().zfill(2), txt)
'file08, file09, file10'

Upvotes: 0

Jerry
Jerry

Reputation: 71538

You could capture the number and check the length before adding 0, but you might be able to use this instead:

import re
txt = 'file8, file9, file10'
pat = r"(?<!\d)(\d)(?=,|$)"
regexp = re.compile(pat)
print(regexp.sub(r"0\1", txt))

regex101 demo

(?<! ... ) is called a negative lookbehind. This prevents (negative) a match if the pattern after it has the pattern in the negative lookbehind matches. For example, (?<!a)b will match all b in a string, except if it has an a before it, meaning bb, cb matches, but ab doesn't match. (?<!\d)(\d) thus matches a digit, unless it has another digit before it.

(\d) is a single digit, enclosed in a capture group, denoted by simple parentheses. The captured group gets stored in the first capture group.

(?= ... ) is a positive lookahead. This matches only if the pattern inside the positive lookahead matches after the pattern before this positive lookahead. In other words, a(?=b) will match all a in a string only if there's a b after it. ab matches, but ac or aa don't.

(?=,|$) is a positive lookahead containing ,|$ meaning either a comma, or the end of the string.

(?<!\d)(\d)(?=,|$) thus matches any digit, as long as there's no digit before it and there's a comma after it, or if that digit is at the end of the string.

Upvotes: 1

VIKASH JAISWAL
VIKASH JAISWAL

Reputation: 838

how about?

a='file1'    
a='file' + "%02d" % int(a.split('file')[1])

Upvotes: 0

Related Questions