Reputation: 49
I am struggling to split this string on the basis of comma but comma inside the double quotes should be ignored.
cStr = 'aaaa,bbbb,"ccc,ddd"'
expected result : ['aaaa','bbbb',"ccc,ddd" ]
please help me, I tried different methods as mentioned in below soln but couldn't resolve this issue [I am not allowed to use csv, pyparsing module]
there is already similar question asked before for the below input.
cStr = '"aaaa","bbbb","ccc,ddd"'
result = ['"aaa"','"bbb"','"ccc,ddd"']
Upvotes: 1
Views: 465
Reputation: 354
This can be achieved in three steps-
cstr = 'aaaa,bbbb,"ccc,ddd","eee,fff,ggg"'
Step 1-
X = cstr.split(',"')
Step 2-
regular_list = [i if '"' in i else i.split(",") for i in X ]
Step 3-
final_list = []
for i in regular_list:
if type(i) == list:
for j in i:
final_list.append(j)
else:
final_list.append('"'+i)
Final output -
['aaaa', 'bbbb', '"ccc,ddd"', '"eee,fff,ggg"']
Upvotes: 0
Reputation: 18106
You could use list comprehension, no other libraries needed:
cStr = 'aaaa,bbbb,"ccc,ddd"'
# split by ," afterwards by , if item does not end with double quotes
l = [
item.split(',') if not item.endswith('"') else [item[:-1]]
for item in cStr.split(',"')
]
print(sum(l, []))
Out:
['aaaa', 'bbbb', 'ccc,ddd']
Upvotes: 0
Reputation: 521178
The usual way I handle this is to use a regex alternation which eagerly matches double quoted terms first, before non quoted CSV terms:
import re
cStr = 'aaaa,bbbb,"ccc,ddd"'
matches = re.findall(r'(".*?"|[^,]+)', cStr)
print(matches) # ['aaaa', 'bbbb', '"ccc,ddd"']
Upvotes: 2