Erik
Erik

Reputation: 7741

Resetting csv.DictReader(StringIO.StringIO(some_string))

I am using python's csv.DictReader but I am initializing it with a string like this:

dict_reader = csv.DictReader(StringIO.StringIO(some_string))

Is there a way to reset the DictReader's iterator so that I can use it multiple times? I would rather not re-parse some_string as it can be an expensive operation.

Upvotes: 1

Views: 2829

Answers (1)

crayzeewulf
crayzeewulf

Reputation: 6020

As you probably already know, the initialization:

dict_reader = csv.DictReader(StringIO.StringIO(some_string))

does not actually read anything from the StringIO.StringIO instance. The dict_reader starts reading only when you start grabbing rows from it and it reads the input line-by-line. In other words, it will only read as many lines as the number of rows you ask from it. Here is an example:

#! /usr/bin/env python
import csv
try:
    from StringIO import StringIO   # Python 2.x
except ImportError:
    from io import StringIO         # Python 3.x

test_string = """name,value
foo,1
bar,2
"""

string_io = StringIO(test_string)
# 
# Position is 0 i.e. the beginning of the string.
# 
print("Position: {}".format(string_io.tell()))

dict_reader = csv.DictReader(string_io)
#
# Position is still 0. Nothing has been read.
#
print("Position: {}".format(string_io.tell()))
#
# Now we start reading from string_io
#
for row in dict_reader:
    print(row)
    #
    # Position increases every time you read 
    # a row using dict_reader.
    #
    print("Position: {}".format(string_io.tell()))

This will print:

Position: 0
Position: 0
{'name': 'foo', 'value': '1'}
Position: 17
{'name': 'bar', 'value': '2'}
Position: 23

At the end of all this the current position in string_io will point to the end of the string. So, even if you could reuse dict_reader you will have to seek to the beginning of string_io first and start scanning it all over again. In fact, you can do the following after the above code:

string_io.seek(0)
for row in dict_reader:
    print(row)
    print("Position: {}".format(string_io.tell()))

This for loop will print the following:

{'name': 'name', 'value': 'value'}
Position: 11
{'name': 'foo', 'value': '1'}
Position: 17
{'name': 'bar', 'value': '2'}
Position: 23

Notice that dict_reader now treats the first line of string_io as data rather than using it to decide the names of the fields. Furthermore, dict_reader itself does not keep all the lines that it has scanned. Once a row is passed to you, it is no longer available via dict_reader. You can see this from definition of csv.DictReader.next() in csv.py and Reader_iternext() in _csv.c. So, you are better off storing the rows somewhere yourselves as suggested in the comments.

Upvotes: 6

Related Questions