Reputation: 2335
When I use the following code to load a csv file using numpy
F = np.loadtxt(F,skiprows=1, delimiter=',',usecols=(2,4,6))
MASS = F[:,4]
#print(MASS)
Z = F[:,6]
N = len(MASS)
print(len(MASS))
I get the following error
Traceback (most recent call last):
File "C:\Users\Codes\test2.py", line 16, in <module>
F = np.loadtxt(F,skiprows=1, delimiter=',',usecols=(2,4,6))
File "C:\Python34\lib\site-packages\numpy\lib\npyio.py", line 859, in loadtxt
X.append(items)
MemoryError
I have 24Gb if physical memory and the file is 2.70Gb so I do not understand why I am getting this error. Thanks!
EDIT
I also tried to load the same file like this
f = open(F) #Opens file
f.readline() # Strips Header
nlines = islice(f, N) #slices file to only read N lines
for line in nlines:
if line !='':
line = line.strip()
line = line.replace(',',' ') #Replace comma with space
columns = line.split()
tid = columns[2]
m = columns[4]
r = columns[6] # assigns variable to columns
M.append(m)
R.append(r) #appends data in list
TID.append(tid)
print(len(MASS))
and got another memory error.
Traceback (most recent call last):
File "C:\Users\Loop test.py", line 58, in <module>
M.append(m)
MemoryError
It seems like in this case it is running out of memory when building the first list M
Upvotes: 2
Views: 3307
Reputation: 284562
First off, I'd check that you're actually using a 64-bit build of python. On Windows, it's common to wind up with 32-bit builds, even on 64-bit systems.
Try:
import platform
print(platform.architecture()[0])
If you see 32bit
, that's your problem. A 32-bit execuctable can only address 2GB of memory, so you can never have an array (or other object) over 2GB.
However, loadtxt
is rather inefficient because it works by building up a list
and then converting it to a numpy array. Your example code does the same thing. (pandas.read_csv
is much more efficient and very heavily optimized, if you happen to have pandas
around.)
A list
is a much less memory-efficient structure than a numpy array. It's analogous to an array of pointers. In other words, each item in a list has an additional 64-bits.
You can improve on this by using numpy.fromiter
if you need "leaner" text I/O. See Python out of memory on large CSV file (numpy) for a more complete discussion (shameless plug).
Nonetheless, I don't think your problem is loadtxt
. I think it's a 32-bit build of python.
Upvotes: 6
Reputation: 13614
The problem I believe is the requirement of continuous memory to load 2.7GB data. It is most probably more than 2.7 in memory as well because of the data structure and language utilities as well. It is better to use chunks of the same file or using HDF5 like data structures. http://www.h5py.org/
Upvotes: 1