Reputation: 2527
I've seen quite a few questions on how to segment a dataframe into various chunks. What I want is to know how to convert a dataframe into exactly the same object that you get when loading a csv file to a dataframe with the chunksize parameter i.e.
df = pd.read_csv(file_path, chunksize=1e5)
type(df)
>> pandas.io.parsers.TextFileReader
I want to recreate an identical TextFileReader object from a dataframe containing the dataframe data in various chunks. Any ideas on how to do this?
Upvotes: 1
Views: 7670
Reputation: 92854
With text stream object StringIO
and pd.read_csv
function:
(df
below contains a sample dataframe)
In [216]: df
Out[216]:
Date Name Wage
0 5/1/19 Joe $100
1 5/1/19 Sam $120
2 5/1/19 Kate $30
3 5/2/19 Joe $120
4 5/2/19 Sam $134
5 5/2/19 Kate $56
6 5/3/19 Joe $89
7 5/3/19 Sam $90
8 5/3/19 Kate $231
In [217]: from pandas.compat import StringIO
In [218]: reader = pd.read_csv(StringIO(df.to_csv()), iterator=True)
In [219]: type(reader)
Out[219]: pandas.io.parsers.TextFileReader
In [220]: reader.get_chunk(3)
Out[220]:
Unnamed: 0 Date Name Wage
0 0 5/1/19 Joe $100
1 1 5/1/19 Sam $120
2 2 5/1/19 Kate $30
Of course, you may specify a concrete chunk size via chunksize
option.
iterator : boolean, default False
Return TextFileReader object for iteration or getting chunks with
get_chunk()
.chunksize : int, default None
Return TextFileReader object for iteration.
http://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-chunking
Upvotes: 2