wasp256
wasp256

Reputation: 6242

python split text by quotes and spaces

I have the following text

text = 'This is "a simple" test'

And I need to split it in two ways, first by quotes and then by spaces, resulting in:

res = ['This', 'is', '"a simple"', 'test']

But with str.split() I'm only able to use either quotes or spaces as delimiters. Is there a built in function for multiple delimiters?

Upvotes: 7

Views: 5372

Answers (6)

Moses Koledoye
Moses Koledoye

Reputation: 78554

You can use shlex.split, handy for parsing quoted strings:

>>> import shlex
>>> text = 'This is "a simple" test'
>>> shlex.split(text, posix=False)
['This', 'is', '"a simple"', 'test']

Doing this in non-posix mode prevents the removal of the inner quotes from the split result. posix is set to True by default:

>>> shlex.split(text)
['This', 'is', 'a simple', 'test']

If you have multiple lines of this type of text or you're reading from a stream, you can split efficiently (excluding the quotes in the output) using csv.reader:

import io
import csv

s = io.StringIO(text.decode('utf8')) # in-memory streaming
f = csv.reader(s, delimiter=' ', quotechar='"')
print(list(f))
# [['This', 'is', 'a simple', 'test']]

If on Python 3, you won't need to decode the string to unicode as all strings are already unicode.

Upvotes: 13

Rishabh Rusia
Rishabh Rusia

Reputation: 117

You can look into shlex library.

from shlex import split
a = 'This is "a simple" text'
split(a)

['This', 'is', 'a simple', 'text']

I don't think regex is what you are looking for

Upvotes: 0

R.A.Munna
R.A.Munna

Reputation: 1709

using csv reader.

import csv 
text = 'This is "a simple" test'
list_text=[]
list_text.append(text)
for row in csv.reader(list_text, delimiter=" "):
    print(row)

you can also see more about here

Upvotes: 0

Marcos Rusiñol
Marcos Rusiñol

Reputation: 105

try using re:

import re
text = 'This is "a simple" test'
print(re.split('\"|\s', text))

The result:

['This', 'is', '', 'a', 'simple', '', 'test']

Upvotes: 0

Samat Sadvakasov
Samat Sadvakasov

Reputation: 639

If I understand you right, then you can use regex

>>> import re
>>> text = 'This is "a simple" test'

>>> re.split('\s|\"', text)

['This', 'is', '', 'a', 'simple', '', 'test']

Upvotes: 1

Rahul
Rahul

Reputation: 11560

For your case shlex.split will just do fine.

As answer to multiple delimiters?

import re

re.split('\"|\s', string)

Upvotes: 1

Related Questions