danspants
danspants

Reputation: 3417

Splitting a filename into words and numbers in Python

The following code splits a string into a list of words but does not include numbers:

    txt="there_once was,a-monkey.called phillip?09.txt"
    sep=re.compile(r"[\s\.,-_\?]+")
    sep.split(txt)

['there', 'once', 'was', 'a', 'monkey', 'called', 'phillip', 'txt']

This code gives me words and numbers but still includes "_" as a valid character:

re.findall(r"\w+|\d+",txt)
['there_once', 'was', 'a', 'monkey', 'called', 'phillip', '09', 'txt']

What do I need to alter in either piece of code to end up with the desired result of:

['there', 'once', 'was', 'a', 'monkey', 'called', 'phillip', '09', 'txt']

Upvotes: 1

Views: 2450

Answers (2)

outis
outis

Reputation: 77400

For the example case,

sep = re.compile(r"[^a-zA-Z0-9]+")
sea.split(txt)

should work. To separate numbers from words, try

re.findall(r"[a-zA-Z]+|\d+", txt)

Upvotes: 2

David Z
David Z

Reputation: 131570

Here's a quick way that should do it:

re.findall(r"[a-zA-Z0-9]+",txt)

Here's another:

re.split(r"[\s\.,\-_\?]+",txt)

(you just needed to escape the hyphen because it has a special meaning in a character class)

Upvotes: 2

Related Questions