Elec
Elec

Reputation: 61

Split string only after double quotes in python

I have a string like this here:

"BLAX", "BLAY", "BLAZ, BLUBB", "BLAP"

And yes the double quotes are within this string.

Now I want to split this string into several parts with mystring.split(",") What I got is this

"BLAX"

"BLAY"

"BLAZ

BLUBB"

"BLAP"

But what I want is this:

"BLAX"

"BLAY"

"BLAZ, BLUBB"

"BLAP"

How can I achieve this and as well I want to keep the double quotes? I need this because I work with toml files.

Solution: Thanks @Giacomo Alzetta

I used the split command with the regular expression. Thanks also for explaining this!

Upvotes: 2

Views: 802

Answers (6)

Andy L.
Andy L.

Reputation: 25269

You may replace and split

s.replace('", ', '"|').split('|')

Out[672]: ['"BLAX"', ' "BLAY"', ' "BLAZ, BLUBB"', ' "BLAP"'] 

Upvotes: 0

Rakesh
Rakesh

Reputation: 82795

You can also use the csv module.

Ex:

import csv

s = '"BLAX", "BLAY", "BLAZ, BLUBB", "BLAP"' 
r = csv.reader(s, delimiter = ',', quotechar='"')
res = [j for i in r for j in i if j.strip()] 
print(res)  

Output:

['BLAX', 'BLAY', 'BLAZ, BLUBB', 'BLAP']

Upvotes: 2

h4z3
h4z3

Reputation: 5478

As I said in comments, you can split at more than a single separator. A comma gets both a one in quotes and outside, but we can do split at ", (added a space so that we don't have to strip it ;) )

Then we add the missing quotations:

original = '"BLAX", "BLAY", "BLAZ, BLUBB", "BLAP"'
[s if s.endswith('"') else s+'"' for s in original.split('", ')]

Output: ['"BLAX"', '"BLAY"', '"BLAZ, BLUBB"', '"BLAP"']

This approach doesn't use regexes, so it's faster. You also don't need to play with what regexes are correct for your case (I generally like regexes, but I like smart splitting and operations more).

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195593

You can use ast.literal_eval and then add '"' manually:

s = '"BLAX", "BLAY", "BLAZ, BLUBB", "BLAP"'

from ast import literal_eval

data = literal_eval('(' + s + ')')

for d in data:
    print('"{}"'.format(d))

Prints:

"BLAX"
"BLAY"
"BLAZ, BLUBB"
"BLAP"

Upvotes: 2

Adam.Er8
Adam.Er8

Reputation: 13413

you can split by " then remove the unwanted leftovers, and rewrap everything in quotes, with a simple list-comp.

string = '"BLAX", "BLAY", "BLAZ, BLUBB", "BLAP"'

parts = ['"{}"'.format(s) for s in string.split('"') if s not in ('', ', ')]

for p in parts:
    print(p)

Output:

"BLAX"
"BLAY"
"BLAZ, BLUBB"
"BLAP"

Upvotes: 1

Giacomo Alzetta
Giacomo Alzetta

Reputation: 2479

You can use a regular expression and the re.split function:

>>> import re
>>> re.split(r'(?<="),', '"BLAX", "BLAY", "BLAZ, BLUBB", "BLAP"')
['"BLAX"', ' "BLAY"', ' "BLAZ, BLUBB"', ' "BLAP"']

(?<=") means must be preceded by " but the " is not included in the actual match so only the , is used to actually do the splitting.

You could split by ", but then you'd have to fix up the parts where the " is now missing:

>>> '"BLAX", "BLAY", "BLAZ, BLUBB", "BLAP"'.split('",')
['"BLAX', ' "BLAY', ' "BLAZ, BLUBB', ' "BLAP"']
>>> [el + ('' if el.endswith('"') else '"') for el in '"BLAX", "BLAY", "BLAZ, BLUBB", "BLAP"'.split('",')]
['"BLAX"', ' "BLAY"', ' "BLAZ, BLUBB"', ' "BLAP"']

Upvotes: 1

Related Questions