Memory error when using Numpy load text

Question

When I use the following code to load a csv file using numpy

F = np.loadtxt(F,skiprows=1, delimiter=',',usecols=(2,4,6))
MASS = F[:,4]
#print(MASS)
Z = F[:,6]
N = len(MASS)
print(len(MASS))

I get the following error

Traceback (most recent call last):
File "C:\Users\Codes	est2.py", line 16, in 
F = np.loadtxt(F,skiprows=1, delimiter=',',usecols=(2,4,6))
File "C:\Python34\lib\site-packages
umpy\lib
pyio.py", line 859, in   loadtxt
X.append(items)
MemoryError

I have 24Gb if physical memory and the file is 2.70Gb so I do not understand why I am getting this error. Thanks!

EDIT

I also tried to load the same file like this

f = open(F)           #Opens file
f.readline()          # Strips Header
nlines = islice(f, N) #slices file to only read N lines


for line in nlines:              
 if line !='':
      line = line.strip()
      line = line.replace(',',' ') #Replace comma with space
      columns = line.split()
      tid = columns[2]
      m = columns[4]  
      r = columns[6]               # assigns variable to columns
      M.append(m)
      R.append(r)                       #appends data in list
      TID.append(tid)



print(len(MASS))

and got another memory error.

 Traceback (most recent call last):
  File "C:\Users\Loop test.py", line 58, in 
     M.append(m)
    MemoryError

It seems like in this case it is running out of memory when building the first list M

Joe Kington · Accepted Answer

First off, I'd check that you're actually using a 64-bit build of python. On Windows, it's common to wind up with 32-bit builds, even on 64-bit systems.

Try:

import platform
print(platform.architecture()[0])

If you see 32bit, that's your problem. A 32-bit execuctable can only address 2GB of memory, so you can never have an array (or other object) over 2GB.

However, loadtxt is rather inefficient because it works by building up a list and then converting it to a numpy array. Your example code does the same thing. (pandas.read_csv is much more efficient and very heavily optimized, if you happen to have pandas around.)

A list is a much less memory-efficient structure than a numpy array. It's analogous to an array of pointers. In other words, each item in a list has an additional 64-bits.

You can improve on this by using numpy.fromiter if you need "leaner" text I/O. See Python out of memory on large CSV file (numpy) for a more complete discussion (shameless plug).

Nonetheless, I don't think your problem is loadtxt. I think it's a 32-bit build of python.

Memory error when using Numpy load text

Answers (2)

Related Questions