Reputation: 20698
I have an input of the following format:
<integer>: <word> ... # <comment>
where ...
can represent one or more <word>
strings.
Here is an example:
1: foo bar baz # This is an example
I want to split this input apart with regular expression and return a tuple that contains the integer followed by each word. For the above example, I want:
(1, 'foo', 'bar', 'baz')
This is what I have tried.
>>> re.match('(\d+):( \w+)+', '1: foo bar baz # This is an example').groups()
('1', ' baz')
I am getting the integer and the last word only. How do I get the integer and all the words that the regex matches?
Upvotes: 2
Views: 538
Reputation: 215039
The trick here is to use lookeaheads: let's find either digits (followed by a colon) or words (followed by letters/spaces and a hash):
s = "1: foo bar baz # This is an example"
print re.findall(r'\d+(?=:)|\w+(?=[\w\s]*#)', s)
# ['1', 'foo', 'bar', 'baz']
The only thing that remains is to convert "1"
to an int - but you can't do that with regexp.
Upvotes: 1
Reputation: 1630
You can probably make it a lot clearer with simple string manipulation.
my_string = '1: foo bar baz'
num_string, word_string = my_string.split(':')
num = int(num_string)
words = word_string.strip().split(' ')
print(num)
print(words)
Output:
# num = 1
# words = ['foo', 'bar', 'baz']
Upvotes: 1
Reputation: 251146
Non-regex solution:
>>> s = '1: foo bar baz # This is an example'
>>> a, _, b = s.partition(':')
>>> [int(a)] + b.partition('#')[0].split()
[1, 'foo', 'bar', 'baz']
Upvotes: 2