Reputation: 161
I am trying to use regex in Python to match a string:
pattern = re.compile(r"(\d+?\,\s[a-zA-Z]+?\,\s\d{4}\-\d{2}\-\d{2})")
string = '[ 1234, jack, 1987-09-02]'
ret = pattern.findall(string)
This returns the whole string as the list element: ['1234, jack, 1987-09-02']
but i am trying to get a list with each match as an element: ['1234', 'jack', '1987-09-02']
I know '+' is greedy but i added '?'
Upvotes: 1
Views: 2531
Reputation: 626802
Your pattern matches the whole contents inside the square brackets, while you seem to want to only get the character chunks that consist of word and hyphen characters.
Use
pattern = re.compile(r"[\w-]+")
See the regex demo
See the IDEONE demo:
import re
pattern = re.compile(r"[\w-]+")
string = '[ 1234, jack, 1987-09-02]'
ret = pattern.findall(string)
print(ret)
# => ['1234', 'jack', '1987-09-02']
Pattern details: [\w-]
is a character class matching a word character (a digit, letter or underscore) one or more number of times (due to the +
quantifier).
An alternative solution: Match optional whitespaces and then match and capture all non-comma symbols with
pattern = re.compile(r"\s*([^[\],]+)")
See another regex and IDEONE demos. re.findall
only returns captured values into Groups 1+, so only what was captured with (...)
(i.e. all 1+ characters other than ]
, [
and ,
will be returned).
Upvotes: 1
Reputation: 287835
Since you only want to match once, use search
instead of findall
and introduce groups (live demo):
>>> import re
>>> string = '[ 1234, jack, 1987-09-02]'
>>> pattern = re.compile(r"(\d+?),\s([a-zA-Z]+?),\s(\d{4}\-\d{2}\-\d{2})")
>>> pattern.search(string).groups()
('1234', 'jack', '1987-09-02')
groups
returns a tuple instead of a list, which means the result can be destructured (like number, name, birthday = pattern.search(string).groups()
) or passed around, but not added to. If you really need a list, simply use list(pattern.search(string).groups())
.
Upvotes: 0