Reputation: 693
I'm trying to read a large csv file in Python; it has some 700 attributes and 101533 rows. I tried reading the file using pandas.read_csv
command but it gave memory issue then I tried this solution
import numpy as np
with file("data.csv", "rb") as f:
title = f.readline() # if your data have a title line.
data = np.loadtxt(f, delimiter=",") # if your data splitted by ","
print np.sum(data, axis=0) # sum along 0 axis to get the sum of every column
but it doesn't work for large data set however works fine for small data set. How can I read this file in python?
Upvotes: 2
Views: 2104
Reputation: 107287
You can use csv
module to load your csv
file and use itertools.izip()
function in order to get a generator of columns then get the first columns by next()
.
Note that csv.reader()
return a reader object which is an iterator like object (one shot iterable), which means that it won't waste your memory and will produce the rows on demand.
:
import csv
from itertools import izip
with open("data.csv", "rb") as f:
reader = csv.reader(f)
print sum(next(izip(*reader)))
Upvotes: 1