Reputation: 3543
Input:
x = "121, 1238,\nxyz,\n 123abc \n\rabc123"
I want to split This string with the delimiters ",", "\n", "\r", "\s"
to get the output
['121', '1238', 'xyz', '123abc', 'abc123']
Whatever I try, the delimiters are accepted as single characters and not as combination of characters. e.g.
1.
re.split("\n|,|\s|\r", x)
Gave output of
['121', '', '1238', '', 'xyz', '', '', '123abc', '', '', 'abc123']
re.split("\n\s|,|\s|\r", x)
Gave output of
['121', '', '1238', '', 'xyz', '', '123abc', '', 'abc123']
The second one is a slight improvement over the first one. But if that's what is required, I need to give all possible combinations manually.
something Like (with more combinations):
re.split("\n\s|\s\n|\s\n\s|\n|,\s|\s,|\s,\s|,|\s|\r", x)
output:
['121', '1238', 'xyz', '', '123abc', '', 'abc123']
Is there any better way to do this?
Upvotes: 3
Views: 11833
Reputation: 4191
Allow re.split
to take as a delimiter 1 or more repetitions of any of your delimiting characters:
>>> re.split("[,\s]+", x)
['121', '1238', 'xyz', '123abc', 'abc123']
(The '*', '+', and '?' qualifiers are all greedy, they match as much as they can.)
Upvotes: 1
Reputation: 3543
Combining @Johnny Mopp's and @alfinkel24's comments:
re.split("[\s,]+", x)
Will split the string as required to
['121', '1238', 'xyz', '123abc', 'abc123']
Explanation:
[...]
any of the characters.+
one or more repetitions of the previous characters.\s
any white space characters including "\n, \r, \t"
\s
For Unicode (str) patterns: Matches Unicode whitespace characters (which includes [ \t\n\r\f\v], and also many other characters, for example the non-breaking spaces mandated by typography rules in many languages). If the ASCII flag is used, only [ \t\n\r\f\v] is matched.
For 8-bit (bytes) patterns: Matches characters considered whitespace in the ASCII character set; this is equivalent to [ \t\n\r\f\v].
Upvotes: 3