Lone Learner
Lone Learner

Reputation: 20698

Capture all repetitions of a group using Python regular expression

I have an input of the following format:

<integer>: <word> ... # <comment>

where ... can represent one or more <word> strings.

Here is an example:

1: foo bar baz # This is an example

I want to split this input apart with regular expression and return a tuple that contains the integer followed by each word. For the above example, I want:

(1, 'foo', 'bar', 'baz')

This is what I have tried.

>>> re.match('(\d+):( \w+)+', '1: foo bar baz # This is an example').groups()
('1', ' baz')

I am getting the integer and the last word only. How do I get the integer and all the words that the regex matches?

Upvotes: 2

Views: 538

Answers (3)

georg
georg

Reputation: 215039

The trick here is to use lookeaheads: let's find either digits (followed by a colon) or words (followed by letters/spaces and a hash):

s = "1: foo bar baz # This is an example"
print re.findall(r'\d+(?=:)|\w+(?=[\w\s]*#)', s)
# ['1', 'foo', 'bar', 'baz']

The only thing that remains is to convert "1" to an int - but you can't do that with regexp.

Upvotes: 1

Chris Arena
Chris Arena

Reputation: 1630

You can probably make it a lot clearer with simple string manipulation.

my_string = '1: foo bar baz'
num_string, word_string = my_string.split(':')
num = int(num_string)
words = word_string.strip().split(' ')

print(num)
print(words)

Output:

# num = 1
# words = ['foo', 'bar', 'baz']

Upvotes: 1

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 251146

Non-regex solution:

>>> s = '1: foo bar baz # This is an example'
>>> a, _, b = s.partition(':')
>>> [int(a)] + b.partition('#')[0].split()
[1, 'foo', 'bar', 'baz']

Upvotes: 2

Related Questions