Reputation: 1129
I want to turn a string that looks like this:
ABC12DEF3G56HIJ7
into
12 * ABC
3 * DEF
56 * G
7 * HIJ
I want to construct the correct set of loops using regex matching. The crux of the issue is that the code has to be completely general because I cannot assume how long the [A-Z]
fragments will be, nor how long the [0-9]
fragments will be.
Upvotes: 107
Views: 139126
Reputation: 21
Looking at the OP's post, the format of the output seems to matter. The * is aligned to max length of the number.
If that is the case, then this example produces the desired output:
import re
s = "ABC12DEF3G56HIJ7"
pattern = re.compile(r'([A-Z]+)([0-9]+)')
key_length = 0
matches = pattern.findall(s)
for (_, key) in matches:
key_length = max(key_length, len(key))
for (value, key) in matches:
print(f"{key:<{key_length}} * {value}")
This could be done in a more pythonic way, such as with list comprehension, but this has been avoided for clarity.
E.g the output for s = "ABC1215431DEF3G56HIJ7"
would/should be:
1215431 * ABC
3 * DEF
56 * G
7 * HIJ
Upvotes: 2
Reputation: 39
A bit simpler one liner would be
print(re.sub(r"([A-Z]+)(\d+)", r'\2 * \1\n', s))
Upvotes: 1
Reputation: 23051
Yet another option could be to use re.sub()
to create the desired strings from the captured groups:
import re
s = 'ABC12DEF3G56HIJ7'
for x in re.sub(r"([A-Z]+)(\d+)", r'\2 * \1,', s).rstrip(',').split(','):
print(x)
12 * ABC
3 * DEF
56 * G
7 * HIJ
Upvotes: 0
Reputation: 13718
It is better to use re.finditer
if your dataset is large because that reduces memory consumption (findall()
return a list of all results, finditer()
finds them one by one).
import re
s = "ABC12DEF3G56HIJ7"
pattern = re.compile(r'([A-Z]+)([0-9]+)')
for m in re.finditer(pattern, s):
print m.group(2), '*', m.group(1)
Upvotes: 99