Reputation: 33
I get CSV lines of the form
a,b,c,[1,2,3,4]
and I want to extract
a
b
c
[1,2,3,4]
But with
csv.reader([message], delimiter=',')
I get
a
b
c
[1
2
3
4]
This is not particularly surprising, but I wonder how I can specify a dialect so that the CSV line is correctly parsed. Is this even a valid dialect?
Upvotes: 3
Views: 1963
Reputation: 29630
Python's csv.reader accepts a quotechar
keyword parameter, which tells it to ignore delimiters (,
) inside the quotechar
value. By default, it's set to a double-quote "
. That can be used to group [1,2,3,4]
as one column value. Unfortunately, your "quotes" come as square brackets which come as 2 characters [
and ]
and you can't assign them both to quotechar
.
The typical trick is to convert to the same quotechar
value, either all [
to ]
or all ]
to [
, then call
csv.reader(data, quotechar='[') # or ']' whichever you replaced everything into
If your input CSV isn't that big and you can read it all in-memory, you can read the entire file, replace all brackets to the same "shape", store the converted lines into an StringIO in-memory text stream, then pass that to csv.reader
to read each row:
import csv
from io import StringIO
converted = StringIO()
with open('data.csv') as file:
converted.write(file.read().replace('[', ']'))
print(converted.getvalue())
converted.seek(0)
reader = csv.reader(converted, quotechar=']')
for row in reader:
print(row)
print(*row, sep='\n')
a,b,c,]1,2,3,4]
d,e,f,]4,5,6,7]
['a', 'b', 'c', '1,2,3,4']
a
b
c
1,2,3,4
['d', 'e', 'f', '4,5,6,7']
d
e
f
4,5,6,7
But if you need the square brackets in the parsed contents, as you mentioned:
and I want to extract
a b c [1,2,3,4]
Then one option is to use some other quotechar
value (let's say ;
) and put it beside each square bracket, like this:
a,b,c,;[1,2,3,4];
d,e,f,;[4,5,6,7];
Then use the same trick to replace all the brackets with bracket+;
, and then this time use csv.reader(data, quotechar=';')
:
import csv
from io import StringIO
converted = StringIO()
with open('data.csv') as file:
converted.write(file.read().replace('[', ';[').replace(']', '];'))
print(converted.getvalue())
converted.seek(0)
reader = csv.reader(converted, quotechar=';')
for row in reader:
print(row)
print(*row, sep='\n')
a,b,c,;[1,2,3,4];
d,e,f,;[4,5,6,7];
['a', 'b', 'c', '[1,2,3,4]']
a
b
c
[1,2,3,4]
['d', 'e', 'f', '[4,5,6,7]']
d
e
f
[4,5,6,7]
If the CSV is too big though to store all the lines in-memory, you might need to save it back to a new file first, then re-open and re-read it with csv.reader
with a proper quotechar
.
Upvotes: 3