kyrenia
kyrenia

Reputation: 5565

Read in csv file faster

I am currently reading in a large csv file (around 100 million lines), using command along the lines of that described in https://docs.python.org/2/library/csv.html e.g. :

import csv
with open('eggs.csv', 'rb') as csvfile:
     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
     for row in spamreader:
          process_row(row)

This is proving rather slow, I suspect because each line is read in individually (requiring lots of read calls to the hard drive). Is there any way of reading the whole csv file in at once, and then iterating over it? Although the file itself is large in size (e.g. 5Gb), my machine has sufficient ram to hold that in memory.

Upvotes: 0

Views: 439

Answers (5)

Mohan
Mohan

Reputation: 16

You can also use chunksize in read_csv to read it in pieces and process it:

# chunksize is defined here
df = pd.read_csv("path/test.csv", chunksize=10000)

for data in df: 
    print(data.shape)

Source

Upvotes: 0

M_x
M_x

Reputation: 876

If your csv file larger then your ram then you can use

  • DASK (Dask is a parallel computing and data analytics library for Python. It supports dynamic task scheduling optimized for computation as well as big data collections. )

Dask Dataframe from Dask Official ... Dask Wikipedia

with dask dataframe you can do data analysis even if you have big dataset

Upvotes: 0

Robᵩ
Robᵩ

Reputation: 168596

Yes, there is a way to read the entire file at once:

with open('eggs.csv', 'rb', 5000000000) as ...:
    ... 

Reference: https://docs.python.org/2/library/functions.html#open

Upvotes: 1

Moses Koledoye
Moses Koledoye

Reputation: 78536

my machine has sufficient ram to hold that in memory.

Well then, call list on the iterator:

spamreader = list(csv.reader(csvfile, delimiter=' ', quotechar='|'))

Upvotes: 1

Mohammad Athar
Mohammad Athar

Reputation: 1980

import pandas as pd
df =pd.DataFrame.from_csv('filename.csv')

This will read it in as a pandas dataframe so you can do all sorts of fun things with it

Upvotes: 3

Related Questions