Reputation: 7741
I am using python's csv.DictReader but I am initializing it with a string like this:
dict_reader = csv.DictReader(StringIO.StringIO(some_string))
Is there a way to reset the DictReader's iterator so that I can use it multiple times? I would rather not re-parse some_string as it can be an expensive operation.
Upvotes: 1
Views: 2829
Reputation: 6020
As you probably already know, the initialization:
dict_reader = csv.DictReader(StringIO.StringIO(some_string))
does not actually read anything from the StringIO.StringIO
instance. The dict_reader
starts reading only when you start grabbing rows from it and it reads the input line-by-line. In other words, it will only read as many lines as the number of rows you ask from it. Here is an example:
#! /usr/bin/env python
import csv
try:
from StringIO import StringIO # Python 2.x
except ImportError:
from io import StringIO # Python 3.x
test_string = """name,value
foo,1
bar,2
"""
string_io = StringIO(test_string)
#
# Position is 0 i.e. the beginning of the string.
#
print("Position: {}".format(string_io.tell()))
dict_reader = csv.DictReader(string_io)
#
# Position is still 0. Nothing has been read.
#
print("Position: {}".format(string_io.tell()))
#
# Now we start reading from string_io
#
for row in dict_reader:
print(row)
#
# Position increases every time you read
# a row using dict_reader.
#
print("Position: {}".format(string_io.tell()))
This will print:
Position: 0
Position: 0
{'name': 'foo', 'value': '1'}
Position: 17
{'name': 'bar', 'value': '2'}
Position: 23
At the end of all this the current position in string_io
will point to the end of the string. So, even if you could reuse dict_reader
you will have to seek to the beginning of string_io
first and start scanning it all over again. In fact, you can do the following after the above code:
string_io.seek(0)
for row in dict_reader:
print(row)
print("Position: {}".format(string_io.tell()))
This for
loop will print the following:
{'name': 'name', 'value': 'value'}
Position: 11
{'name': 'foo', 'value': '1'}
Position: 17
{'name': 'bar', 'value': '2'}
Position: 23
Notice that dict_reader
now treats the first line of string_io
as data rather than using it to decide the names of the fields. Furthermore, dict_reader
itself does not keep all the lines that it has scanned. Once a row is passed to you, it is no longer available via dict_reader
. You can see this from definition of csv.DictReader.next()
in csv.py and Reader_iternext()
in _csv.c. So, you are better off storing the rows somewhere yourselves as suggested in the comments.
Upvotes: 6