Reputation: 61

Very large array handling with astropy and numpy

For some reasons I need to use astropy to convert comoving distance to redshift. Basically this involves reading in, looping through and writing out a list or a numpy array... My problem is that each of my lists typically consists of ~ 9.5 x 10^6 elements. And this gives me the MemoryError every time I try to save the output into a new txt file using numpy.savetxt. The memory usage quickly grows, eventually slows down a bit, but always gets higher the 128Gb limit I have.

If anybody has any idea how I can improve the script below, I am very willing to listen. Thank you!

import os
import sys
import glob
import math
import numpy
import astropy
import astropy.units as unit
from astropy.cosmology import *
cosmo = FlatLambdaCDM(H0=70, Om0=0.3)

inFile=sys.argv[1]
outFile=sys.argv[2]

comovingDistance = numpy.loadtxt(inFile, usecols=(2,))

Redshift = numpy.zeros(len(comovingDistance)) 
for i in range(len(comovingDistance)):
    Redshift[i] = z_at_value(cosmo.comoving_distance, comovingDistance[i] * unit.kpc)

output = open(outFile,'w')
numpy.savetxt(output, Redshift, fmt='%1.8e')
output.close()

Below is the error log file:

Traceback (most recent call last):
  File "comoving2redshift.py", line 21, in <module>
    Redshift[i] = z_at_value(cosmo.comoving_distance, comovingDistance[i] * unit.kpc)
  File "/afs/mpa/home/minh/.local/lib/python2.7/site-packages/astropy/cosmology/funcs.py", line 119, in z_at_value
    fval_zmax = func(zmax)
  File "/afs/mpa/home/minh/.local/lib/python2.7/site-packages/astropy/cosmology/core.py", line 1195, in comoving_distance
    return self._comoving_distance_z1z2(0, z)
  File "/afs/mpa/home/minh/.local/lib/python2.7/site-packages/astropy/cosmology/core.py", line 1219, in _comoving_distance_z1z2
    return self._hubble_distance * vectorize_if_needed(f, z1, z2)
  File "/afs/mpa/home/minh/.local/lib/python2.7/site-packages/astropy/units/quantity.py", line 924, in __mul__
    return super(Quantity, self).__mul__(other)
  File "/afs/mpa/home/minh/.local/lib/python2.7/site-packages/astropy/units/quantity.py", line 368, in __array_prepare__
    from .quantity_helper import UNSUPPORTED_UFUNCS, UFUNC_HELPERS
MemoryError

Upvotes: 2

Answers (2)

user707650

Reputation:

As an alternative to my other suggested solution, you can split the input file, iterate over the new set of (temporary) input files, and concatenate the input files at the end. Below is a bash wrapper script that, on the outside, should work identically to the Python script in the question (one input file argument, one output file argument).

#! /bin/bash                                                                                   

nlines=10000                                                                                   
input=$1                                                                                       
output=$2                                                                                      

# use a unique prefix!                                                                         
prefix='tmpsplit'                                                                              
split --lines=$nlines $input $prefix                                                           

outfiles=()                                                                                    
# Assume we only split to a maximum of 26^2 files
# This is the default for split anyway                                              
for filename in ${prefix}??                                                                    
do                                                                                             
        outfile="${filename}-out"                                                              
        ./calcdist.py $filename $outfile                                                      
done                                                                                           

# This assumes the shells orders the glob expansion alphabetically                             
cat ${prefix}*out > $output                                                                    

# Clean up                                                                                     
rm ${prefix}*

You may want to use a temporary directory, instead of relying on a unique prefix.

Upvotes: 1

user707650

Reputation:

I don't know any solution intrinsic to numpy, but you can save some memory allocation by writing each solution promptly to file, and not after the for loop. That saves the memory allocation for Redshift and the memory allocation done behind the scenes when numpy.savetxt() formats floating points to string.

inFile=sys.argv[1]
outFile=sys.argv[2]

comovingDistance = numpy.loadtxt(inFile, usecols=(2,))

with open(outFile, 'w') as fp:
    for distance in comovingDistance:
        fp.write("{:1.8e}\n".format(
            z_at_value(cosmo.comoving_distance, distance * unit.kpc)))

(NB: untested)

Upvotes: 1

Very large array handling with astropy and numpy

Answers (2)

Related Questions