Pandas - is it possible to "rewind" read_csv with chunk= argument?

Question

I am dealing with a big dataset, therefore to read it in pandas I use read_csv with chunk= option.

data = pd.read_csv("dataset.csv", chunksize=2e5)

then I operate on the chunked DataFrame in the following way

any_na_cols = [chunk.do_something() for chunk in data]

the problem is, when I want to do something else in the same way as above, I will get an empty result because I have iterated over chunked DataFrame already. Therefore I would have to call data = pd.read_csv("dataset.csv", chunksize=2e5) again to perform next operation.

Most likely there is no problem with that, but for some reason I feel that this approach is inelegant in some way. Isn't there a method like data.rewind() or something similar that would enable me to iterate through the chunks again? I could not find anything like that in the Documentation. Or maybe I am comitting some design mistake with that approach?

Pandas - is it possible to "rewind" read_csv with chunk= argument?

Answers (1)

Related Questions

Pandas - is it possible to &quot;rewind&quot; read_csv with chunk= argument?

Answers (1)

Related Questions

Pandas - is it possible to "rewind" read_csv with chunk= argument?