Reputation: 6242
I have the following text
text = 'This is "a simple" test'
And I need to split it in two ways, first by quotes and then by spaces, resulting in:
res = ['This', 'is', '"a simple"', 'test']
But with str.split()
I'm only able to use either quotes or spaces as delimiters. Is there a built in function for multiple delimiters?
Upvotes: 7
Views: 5372
Reputation: 78554
You can use shlex.split
, handy for parsing quoted strings:
>>> import shlex
>>> text = 'This is "a simple" test'
>>> shlex.split(text, posix=False)
['This', 'is', '"a simple"', 'test']
Doing this in non-posix mode prevents the removal of the inner quotes from the split result. posix
is set to True
by default:
>>> shlex.split(text)
['This', 'is', 'a simple', 'test']
If you have multiple lines of this type of text or you're reading from a stream, you can split efficiently (excluding the quotes in the output) using csv.reader
:
import io
import csv
s = io.StringIO(text.decode('utf8')) # in-memory streaming
f = csv.reader(s, delimiter=' ', quotechar='"')
print(list(f))
# [['This', 'is', 'a simple', 'test']]
If on Python 3, you won't need to decode the string to unicode as all strings are already unicode.
Upvotes: 13
Reputation: 117
You can look into shlex library.
from shlex import split
a = 'This is "a simple" text'
split(a)
['This', 'is', 'a simple', 'text']
I don't think regex is what you are looking for
Upvotes: 0
Reputation: 1709
using csv
reader.
import csv
text = 'This is "a simple" test'
list_text=[]
list_text.append(text)
for row in csv.reader(list_text, delimiter=" "):
print(row)
you can also see more about here
Upvotes: 0
Reputation: 105
try using re:
import re
text = 'This is "a simple" test'
print(re.split('\"|\s', text))
The result:
['This', 'is', '', 'a', 'simple', '', 'test']
Upvotes: 0
Reputation: 639
If I understand you right, then you can use regex
>>> import re
>>> text = 'This is "a simple" test'
>>> re.split('\s|\"', text)
['This', 'is', '', 'a', 'simple', '', 'test']
Upvotes: 1
Reputation: 11560
For your case shlex.split will just do fine.
As answer to multiple delimiters?
import re
re.split('\"|\s', string)
Upvotes: 1