Stripers247
Stripers247

Reputation: 2335

Memory error when using Numpy load text

When I use the following code to load a csv file using numpy

F = np.loadtxt(F,skiprows=1, delimiter=',',usecols=(2,4,6))
MASS = F[:,4]
#print(MASS)
Z = F[:,6]
N = len(MASS)
print(len(MASS))

I get the following error

Traceback (most recent call last):
File "C:\Users\Codes\test2.py", line 16, in <module>
F = np.loadtxt(F,skiprows=1, delimiter=',',usecols=(2,4,6))
File "C:\Python34\lib\site-packages\numpy\lib\npyio.py", line 859, in   loadtxt
X.append(items)
MemoryError

I have 24Gb if physical memory and the file is 2.70Gb so I do not understand why I am getting this error. Thanks!

EDIT

I also tried to load the same file like this

f = open(F)           #Opens file
f.readline()          # Strips Header
nlines = islice(f, N) #slices file to only read N lines


for line in nlines:              
 if line !='':
      line = line.strip()
      line = line.replace(',',' ') #Replace comma with space
      columns = line.split()
      tid = columns[2]
      m = columns[4]  
      r = columns[6]               # assigns variable to columns
      M.append(m)
      R.append(r)                       #appends data in list
      TID.append(tid)



print(len(MASS))      

and got another memory error.

 Traceback (most recent call last):
  File "C:\Users\Loop test.py", line 58, in <module>
     M.append(m)
    MemoryError

It seems like in this case it is running out of memory when building the first list M

Upvotes: 2

Views: 3307

Answers (2)

Joe Kington
Joe Kington

Reputation: 284562

First off, I'd check that you're actually using a 64-bit build of python. On Windows, it's common to wind up with 32-bit builds, even on 64-bit systems.

Try:

import platform
print(platform.architecture()[0])

If you see 32bit, that's your problem. A 32-bit execuctable can only address 2GB of memory, so you can never have an array (or other object) over 2GB.


However, loadtxt is rather inefficient because it works by building up a list and then converting it to a numpy array. Your example code does the same thing. (pandas.read_csv is much more efficient and very heavily optimized, if you happen to have pandas around.)

A list is a much less memory-efficient structure than a numpy array. It's analogous to an array of pointers. In other words, each item in a list has an additional 64-bits.

You can improve on this by using numpy.fromiter if you need "leaner" text I/O. See Python out of memory on large CSV file (numpy) for a more complete discussion (shameless plug).


Nonetheless, I don't think your problem is loadtxt. I think it's a 32-bit build of python.

Upvotes: 6

erogol
erogol

Reputation: 13614

The problem I believe is the requirement of continuous memory to load 2.7GB data. It is most probably more than 2.7 in memory as well because of the data structure and language utilities as well. It is better to use chunks of the same file or using HDF5 like data structures. http://www.h5py.org/

Upvotes: 1

Related Questions