python split string by multiple delimiters and/or combination of multiple delimiters

Question

Input:

x = "121, 1238,
xyz,
 123abc 

abc123"

I want to split This string with the delimiters ",", " ", " ", "\s" to get the output

['121', '1238', 'xyz', '123abc', 'abc123']

Whatever I try, the delimiters are accepted as single characters and not as combination of characters. e.g.

1.

re.split("
|,|\s|
", x)

Gave output of

['121', '', '1238', '', 'xyz', '', '', '123abc', '', '', 'abc123']

re.split(" \s|,|\s| ", x)

Gave output of

['121', '', '1238', '', 'xyz', '', '123abc', '', 'abc123']

The second one is a slight improvement over the first one. But if that's what is required, I need to give all possible combinations manually.
something Like (with more combinations):

re.split("
\s|\s
|\s
\s|
|,\s|\s,|\s,\s|,|\s|
", x)

output:

['121', '1238', 'xyz', '', '123abc', '', 'abc123']

Is there any better way to do this?

RatDon · Accepted Answer

Combining @Johnny Mopp's and @alfinkel24's comments:

re.split("[\s,]+",  x)

Will split the string as required to

['121', '1238', 'xyz', '123abc', 'abc123']

Explanation:

[...] any of the characters.
+ one or more repetitions of the previous characters.
\s any white space characters including " , , "

Official documentation:

\s
For Unicode (str) patterns: Matches Unicode whitespace characters (which includes [ \f\v], and also many other characters, for example the non-breaking spaces mandated by typography rules in many languages). If the ASCII flag is used, only [ \f\v] is matched.
For 8-bit (bytes) patterns: Matches characters considered whitespace in the ASCII character set; this is equivalent to [ \f\v].

python split string by multiple delimiters and/or combination of multiple delimiters

Answers (2)

Related Questions