Reputation: 37068
...note that values will be delimited by one or more space or TAB characters
How can I use the split() method if there are multiple separating characters of different types, as in this case?
Upvotes: 1
Views: 1159
Reputation: 21
I had the same problem with some strings separated by different whitespace chars, and used \s as shown in the Regular Expressions library specification.
\s matches any whitespace character, this is equivalent to the set [ \t\n\r\f\v].
you will need to import re as the regular expression handler:
import re
line = "something separated\t by \t\t\t different \t things"
workstr = re.sub('\s+','\t',line)
So, any whitespace or separator (\s
) repeated one or more times (+
) is transformed to a single tabulation (\t
), that you can reprocess with split('\t')
workstr = "something`\t`separated`\t`by`\t`different`\t`things"
newline = workstr.split('\t')
newline = ['something','separated','by','different','things']
Upvotes: 2
Reputation: 3970
For whitespace delimeters, str.split()
already does what you may want. From the Python Standard Library,
str.split([sep[, maxsplit]])
If sep is not specified or is
None
, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].For example,
' 1 2 3 '.split()
returns['1', '2', '3']
, and' 1 2 3 '.split(None, 1)
returns['1', '2 3 ']
.
Upvotes: 1
Reputation: 17173
by default split can handle multiple types of white space, not sure if it's enough for what you need but try it:
>>> s = "a \tb c\t\t\td"
>>> s.split()
['a', 'b', 'c', 'd']
It certainly works for multiple spaces and tabs mixed.
Upvotes: 2
Reputation: 3539
Split using regular expressions and not just one separator:
http://docs.python.org/2/library/re.html
Upvotes: 2
Reputation: 5115
You can use regular expressions first:
import re
re.sub('\s+', ' ', 'text with whitespace etc').split()
['text', 'with', 'whitespace', 'etc']
Upvotes: 1
Reputation: 318
Do a text substitution first then split.
e.g. replace all tabs with spaces, then split on space.
Upvotes: 1