Reputation: 5575

Read in csv file faster

I am currently reading in a large csv file (around 100 million lines), using command along the lines of that described in https://docs.python.org/2/library/csv.html e.g. :

import csv
with open('eggs.csv', 'rb') as csvfile:
     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
     for row in spamreader:
          process_row(row)

This is proving rather slow, I suspect because each line is read in individually (requiring lots of read calls to the hard drive). Is there any way of reading the whole csv file in at once, and then iterating over it? Although the file itself is large in size (e.g. 5Gb), my machine has sufficient ram to hold that in memory.

Upvotes: 0

Answers (5)

Mohan

Reputation: 16

You can also use chunksize in read_csv to read it in pieces and process it:

# chunksize is defined here
df = pd.read_csv("path/test.csv", chunksize=10000)

for data in df: 
    print(data.shape)

Source

Upvotes: 0

M_x

Reputation: 878

If your csv file larger then your ram then you can use

DASK (Dask is a parallel computing and data analytics library for Python. It supports dynamic task scheduling optimized for computation as well as big data collections. )

Dask Dataframe from Dask Official ... Dask Wikipedia

with dask dataframe you can do data analysis even if you have big dataset

Upvotes: 0

Robᵩ

Reputation: 168836

Yes, there is a way to read the entire file at once:

with open('eggs.csv', 'rb', 5000000000) as ...:
    ...

Reference: https://docs.python.org/2/library/functions.html#open

Upvotes: 1

Moses Koledoye

Reputation: 78554

my machine has sufficient ram to hold that in memory.

Well then, call list on the iterator:

spamreader = list(csv.reader(csvfile, delimiter=' ', quotechar='|'))

Upvotes: 1

Mohammad Athar

Reputation: 1980

import pandas as pd
df =pd.DataFrame.from_csv('filename.csv')

This will read it in as a pandas dataframe so you can do all sorts of fun things with it

Upvotes: 3

Read in csv file faster

Answers (5)

Related Questions