herbört
herbört

Reputation: 33

How to read a CSV dialect containing a list in square brackets?

I get CSV lines of the form

a,b,c,[1,2,3,4]

and I want to extract

a
b
c
[1,2,3,4]

But with

csv.reader([message], delimiter=',')

I get

a
b
c
[1
2
3
4]

This is not particularly surprising, but I wonder how I can specify a dialect so that the CSV line is correctly parsed. Is this even a valid dialect?

Upvotes: 3

Views: 1963

Answers (1)

Gino Mempin
Gino Mempin

Reputation: 29630

Python's csv.reader accepts a quotechar keyword parameter, which tells it to ignore delimiters (,) inside the quotechar value. By default, it's set to a double-quote ". That can be used to group [1,2,3,4] as one column value. Unfortunately, your "quotes" come as square brackets which come as 2 characters [ and ] and you can't assign them both to quotechar.

The typical trick is to convert to the same quotechar value, either all [ to ] or all ] to [, then call

csv.reader(data, quotechar='[')  # or ']' whichever you replaced everything into

If your input CSV isn't that big and you can read it all in-memory, you can read the entire file, replace all brackets to the same "shape", store the converted lines into an StringIO in-memory text stream, then pass that to csv.reader to read each row:

import csv
from io import StringIO

converted = StringIO()
with open('data.csv') as file:
    converted.write(file.read().replace('[', ']'))
print(converted.getvalue())

converted.seek(0)
reader = csv.reader(converted, quotechar=']')
for row in reader:
    print(row)
    print(*row, sep='\n')
a,b,c,]1,2,3,4]
d,e,f,]4,5,6,7]

['a', 'b', 'c', '1,2,3,4']
a
b
c
1,2,3,4
['d', 'e', 'f', '4,5,6,7']
d
e
f
4,5,6,7

But if you need the square brackets in the parsed contents, as you mentioned:

and I want to extract

a
b
c
[1,2,3,4]

Then one option is to use some other quotechar value (let's say ;) and put it beside each square bracket, like this:

a,b,c,;[1,2,3,4];
d,e,f,;[4,5,6,7];

Then use the same trick to replace all the brackets with bracket+;, and then this time use csv.reader(data, quotechar=';'):

import csv
from io import StringIO

converted = StringIO()
with open('data.csv') as file:
    converted.write(file.read().replace('[', ';[').replace(']', '];'))
print(converted.getvalue())

converted.seek(0)
reader = csv.reader(converted, quotechar=';')
for row in reader:
    print(row)
    print(*row, sep='\n')
a,b,c,;[1,2,3,4];
d,e,f,;[4,5,6,7];

['a', 'b', 'c', '[1,2,3,4]']
a
b
c
[1,2,3,4]
['d', 'e', 'f', '[4,5,6,7]']
d
e
f
[4,5,6,7]

If the CSV is too big though to store all the lines in-memory, you might need to save it back to a new file first, then re-open and re-read it with csv.reader with a proper quotechar.

Upvotes: 3

Related Questions