Reputation:
Is there an alternative to using the csv
module to read a csv file in python3 in a streaming way? Currently my data looks something like this:
"field1"::"field2"::"field3"\x02\n
"1"::"hi\n"::"3"\x02\n
"8"::"ok"::"3"\x02\n
The separator is two characters, ::
(the csv
module only accepts a single character separator) and the line separator also contains two characters, \x02\n
. Are there any csvreaders that can be used for python in a streaming mode that would be able to support this?
Here is an example of what I'm trying to do:
>>> import csv
>>> s = ''''"field1"::"field2"::"field3"\x02\n\n"1"::"hi\n"::"3"\x02\n\n"8"::"ok"::"3"\x02\n'''
>>> csvreader=csv.reader(s, delimiter='::', lineterminator='\x02\n')
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: "delimiter" must be a 1-character string
Loading pandas just to read this csv seems like overkill x 100, so I'd like to see what other options there are.
Upvotes: 2
Views: 898
Reputation: 46779
As you have discovered, the CSV library is not suitable for that data format. You could though pre-parse the data beforehand. For example the following approach should work:
from io import StringIO
import csv
s = '''"field1"::"field2"::"field3"\x02\n\n"1"::"hi\n"::"3"\x02\n\n"8"::"ok"::"3"\x02\n'''
def csv_reader_alt(source):
return csv.reader((line.replace('\x02', '').replace('::', ':') for line in source), delimiter=':')
for row in csv_reader_alt(StringIO(s)):
if row:
print(row)
Giving you the following output:
['field1', 'field2', 'field3']
['1', 'hi\n', '3']
['8', 'ok', '3']
Upvotes: 1
Reputation: 16515
@MartinEvans shows a nice way of doing it in his answer.
Here is the code for reading from a file (not from a string in memory) with proper file handling, using a custom delimiter (implemented using a custom generator):
def get_line(file, delimiter='\n', bufsize=4096):
# https://stackoverflow.com/a/19600562/9225671
buf = ''
while True:
chunk = file.read(bufsize)
if len(chunk) == 0:
# end of file has been reached; serve the remaining data and exit
yield buf
return
buf += chunk
line_list = buf.split(delimiter)
# don't serve the last part yet, first we need to read more chunks from the file
buf = line_list.pop(-1)
for line in line_list:
yield line
if __name__ == '__main__':
with open('my_file.csv') as f:
for line in get_line(f, delimiter='\x02\n'):
if len(line) > 0:
parts = line.split('::')
print(parts)
print([
e.strip('"')
for e in parts])
Does that work for you?
Upvotes: 0