Reputation: 2706
I am trying to read a very large set of data from a CSV file using pandas in python. I need to break up the data into parts to take it in, therefore I would like to take in half of the rows first and then the other half.
I see that there is the chunksize parameter in the read_csv. However, I cannot seem to figure out how to put it all into a matrix or sparse matrix after it is read.
wow = pd.read_csv('TestingCSV.csv', sep=',', header='infer', low_memory=False, chunksize=10, usecols=(range(3, 5)))
This returns a type: <class 'pandas.io.parsers.TextFileReader'>
What is a possible way to take in the different chunks and then reconstruct a matrix or sparse matrix from them?
Upvotes: 0
Views: 269
Reputation: 15963
When you use the read_csv
you need to read the whole file you can't read part of it.
When it comes down to the chunksize
, you need to take those "chunks" that are listed under wow
and concat()
.
For example:
chunks = pd.read_csv(data, chunksize = 100)
df = pd.concat(chunks, ignore_index=True)
So now you have a the full dataframe and you can do whatever analysis you need to do.
It's also an iterable object, so you can do the following:
for chunk in chunks:
#do something to each chunk
Upvotes: 1