findall() returning whole string instead of list

Question

I am trying to use regex in Python to match a string:

pattern = re.compile(r"(\d+?\,\s[a-zA-Z]+?\,\s\d{4}\-\d{2}\-\d{2})")
string = '[ 1234, jack, 1987-09-02]'
ret = pattern.findall(string)

This returns the whole string as the list element: ['1234, jack, 1987-09-02']

but i am trying to get a list with each match as an element: ['1234', 'jack', '1987-09-02']

I know '+' is greedy but i added '?'

Wiktor Stribiżew · Accepted Answer

Your pattern matches the whole contents inside the square brackets, while you seem to want to only get the character chunks that consist of word and hyphen characters.

Use

pattern = re.compile(r"[\w-]+")

See the regex demo

See the IDEONE demo:

import re
pattern = re.compile(r"[\w-]+")
string = '[ 1234, jack, 1987-09-02]'
ret = pattern.findall(string)
print(ret)
# => ['1234', 'jack', '1987-09-02']

Pattern details: [\w-] is a character class matching a word character (a digit, letter or underscore) one or more number of times (due to the + quantifier).

An alternative solution: Match optional whitespaces and then match and capture all non-comma symbols with

pattern = re.compile(r"\s*([^[\],]+)")

See another regex and IDEONE demos. re.findall only returns captured values into Groups 1+, so only what was captured with (...) (i.e. all 1+ characters other than ], [ and , will be returned).

findall() returning whole string instead of list

Answers (2)

Related Questions