da5id
da5id

Reputation: 1129

Looping through python regex matches

I want to turn a string that looks like this:

ABC12DEF3G56HIJ7

into

12 * ABC
3  * DEF
56 * G
7  * HIJ

I want to construct the correct set of loops using regex matching. The crux of the issue is that the code has to be completely general because I cannot assume how long the [A-Z] fragments will be, nor how long the [0-9] fragments will be.

Upvotes: 107

Views: 139126

Answers (5)

Skrap
Skrap

Reputation: 21

Looking at the OP's post, the format of the output seems to matter. The * is aligned to max length of the number.

If that is the case, then this example produces the desired output:

import re

s = "ABC12DEF3G56HIJ7"

pattern = re.compile(r'([A-Z]+)([0-9]+)')

key_length = 0

matches = pattern.findall(s)

for (_, key) in matches:
    key_length = max(key_length, len(key))

for (value, key) in matches:
    print(f"{key:<{key_length}} * {value}")

This could be done in a more pythonic way, such as with list comprehension, but this has been avoided for clarity.

E.g the output for s = "ABC1215431DEF3G56HIJ7" would/should be:

1215431 * ABC
3       * DEF
56      * G  
7       * HIJ

Upvotes: 2

Dabble
Dabble

Reputation: 39

A bit simpler one liner would be

print(re.sub(r"([A-Z]+)(\d+)", r'\2 * \1\n', s))

Upvotes: 1

cottontail
cottontail

Reputation: 23051

Yet another option could be to use re.sub() to create the desired strings from the captured groups:

import re
s = 'ABC12DEF3G56HIJ7'
for x in re.sub(r"([A-Z]+)(\d+)", r'\2 * \1,', s).rstrip(',').split(','):
    print(x)

12 * ABC
3 * DEF
56 * G
7 * HIJ

Upvotes: 0

Mithril
Mithril

Reputation: 13718

It is better to use re.finditer if your dataset is large because that reduces memory consumption (findall() return a list of all results, finditer() finds them one by one).

import re

s = "ABC12DEF3G56HIJ7"
pattern = re.compile(r'([A-Z]+)([0-9]+)')

for m in re.finditer(pattern, s):
    print m.group(2), '*', m.group(1)

Upvotes: 99

Ray Toal
Ray Toal

Reputation: 88378

Python's re.findall should work for you.

Live demo

import re

s = "ABC12DEF3G56HIJ7"
pattern = re.compile(r'([A-Z]+)([0-9]+)')

for (letters, numbers) in re.findall(pattern, s):
    print(numbers, '*', letters)

Upvotes: 157

Related Questions