JahKnows
JahKnows

Reputation: 2706

Python pandas to get specified rows from a CSV file

I am trying to read a very large set of data from a CSV file using pandas in python. I need to break up the data into parts to take it in, therefore I would like to take in half of the rows first and then the other half.

I see that there is the chunksize parameter in the read_csv. However, I cannot seem to figure out how to put it all into a matrix or sparse matrix after it is read.

wow = pd.read_csv('TestingCSV.csv', sep=',', header='infer', low_memory=False, chunksize=10, usecols=(range(3, 5)))

This returns a type: <class 'pandas.io.parsers.TextFileReader'>

What is a possible way to take in the different chunks and then reconstruct a matrix or sparse matrix from them?

Upvotes: 0

Views: 269

Answers (1)

Leb
Leb

Reputation: 15963

When you use the read_csv you need to read the whole file you can't read part of it.

When it comes down to the chunksize, you need to take those "chunks" that are listed under wow and concat().

For example:

chunks = pd.read_csv(data, chunksize = 100)
df = pd.concat(chunks, ignore_index=True)

So now you have a the full dataframe and you can do whatever analysis you need to do.

It's also an iterable object, so you can do the following:

for chunk in chunks:
    #do something to each chunk

Upvotes: 1

Related Questions