Reputation: 37068

Splitting strings separated by multiple possible characters?

...note that values will be delimited by one or more space or TAB characters

How can I use the split() method if there are multiple separating characters of different types, as in this case?

Upvotes: 1

Answers (6)

Ricardo Segovia

Reputation: 21

I had the same problem with some strings separated by different whitespace chars, and used \s as shown in the Regular Expressions library specification.

\s matches any whitespace character, this is equivalent to the set [ \t\n\r\f\v].

you will need to import re as the regular expression handler:

import re
line = "something separated\t by \t\t\t different \t things"
workstr = re.sub('\s+','\t',line)

So, any whitespace or separator (\s) repeated one or more times (+) is transformed to a single tabulation (\t), that you can reprocess with split('\t')

workstr = "something`\t`separated`\t`by`\t`different`\t`things"
newline = workstr.split('\t')
newline = ['something','separated','by','different','things']

Upvotes: 2

dkim

Reputation: 3970

For whitespace delimeters, str.split() already does what you may want. From the Python Standard Library,

str.split([sep[, maxsplit]])

If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].

For example, ' 1 2 3 '.split() returns ['1', '2', '3'], and ' 1 2 3 '.split(None, 1) returns ['1', '2 3 '].

Upvotes: 1

Rusty Rob

Reputation: 17173

by default split can handle multiple types of white space, not sure if it's enough for what you need but try it:

>>> s = "a \tb     c\t\t\td"
>>> s.split()
['a', 'b', 'c', 'd']

It certainly works for multiple spaces and tabs mixed.

Upvotes: 2

Gadi A

Reputation: 3539

Split using regular expressions and not just one separator:

http://docs.python.org/2/library/re.html

Upvotes: 2

rofls

Reputation: 5115

You can use regular expressions first:

import re
re.sub('\s+', ' ', 'text     with    whitespace        etc').split()
['text', 'with', 'whitespace', 'etc']

Upvotes: 1

TheRuss

Reputation: 318

Do a text substitution first then split.

e.g. replace all tabs with spaces, then split on space.

Upvotes: 1

Splitting strings separated by multiple possible characters?

Answers (6)

Related Questions