CPL
CPL

Reputation: 161

findall() returning whole string instead of list

I am trying to use regex in Python to match a string:

pattern = re.compile(r"(\d+?\,\s[a-zA-Z]+?\,\s\d{4}\-\d{2}\-\d{2})")
string = '[ 1234, jack, 1987-09-02]'
ret = pattern.findall(string)

This returns the whole string as the list element: ['1234, jack, 1987-09-02']

but i am trying to get a list with each match as an element: ['1234', 'jack', '1987-09-02']

I know '+' is greedy but i added '?'

Upvotes: 1

Views: 2531

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626802

Your pattern matches the whole contents inside the square brackets, while you seem to want to only get the character chunks that consist of word and hyphen characters.

Use

pattern = re.compile(r"[\w-]+")

See the regex demo

See the IDEONE demo:

import re
pattern = re.compile(r"[\w-]+")
string = '[ 1234, jack, 1987-09-02]'
ret = pattern.findall(string)
print(ret)
# => ['1234', 'jack', '1987-09-02']

Pattern details: [\w-] is a character class matching a word character (a digit, letter or underscore) one or more number of times (due to the + quantifier).

An alternative solution: Match optional whitespaces and then match and capture all non-comma symbols with

pattern = re.compile(r"\s*([^[\],]+)")

See another regex and IDEONE demos. re.findall only returns captured values into Groups 1+, so only what was captured with (...) (i.e. all 1+ characters other than ], [ and , will be returned).

Upvotes: 1

phihag
phihag

Reputation: 287835

Since you only want to match once, use search instead of findall and introduce groups (live demo):

>>> import re
>>> string = '[ 1234, jack, 1987-09-02]'
>>> pattern = re.compile(r"(\d+?),\s([a-zA-Z]+?),\s(\d{4}\-\d{2}\-\d{2})")
>>> pattern.search(string).groups()
('1234', 'jack', '1987-09-02')

groups returns a tuple instead of a list, which means the result can be destructured (like number, name, birthday = pattern.search(string).groups()) or passed around, but not added to. If you really need a list, simply use list(pattern.search(string).groups()).

Upvotes: 0

Related Questions