Reputation: 61
For some reasons I need to use astropy to convert comoving distance to redshift. Basically this involves reading in, looping through and writing out a list or a numpy array... My problem is that each of my lists typically consists of ~ 9.5 x 10^6 elements. And this gives me the MemoryError every time I try to save the output into a new txt file using numpy.savetxt. The memory usage quickly grows, eventually slows down a bit, but always gets higher the 128Gb limit I have.
If anybody has any idea how I can improve the script below, I am very willing to listen. Thank you!
import os
import sys
import glob
import math
import numpy
import astropy
import astropy.units as unit
from astropy.cosmology import *
cosmo = FlatLambdaCDM(H0=70, Om0=0.3)
inFile=sys.argv[1]
outFile=sys.argv[2]
comovingDistance = numpy.loadtxt(inFile, usecols=(2,))
Redshift = numpy.zeros(len(comovingDistance))
for i in range(len(comovingDistance)):
Redshift[i] = z_at_value(cosmo.comoving_distance, comovingDistance[i] * unit.kpc)
output = open(outFile,'w')
numpy.savetxt(output, Redshift, fmt='%1.8e')
output.close()
Below is the error log file:
Traceback (most recent call last):
File "comoving2redshift.py", line 21, in <module>
Redshift[i] = z_at_value(cosmo.comoving_distance, comovingDistance[i] * unit.kpc)
File "/afs/mpa/home/minh/.local/lib/python2.7/site-packages/astropy/cosmology/funcs.py", line 119, in z_at_value
fval_zmax = func(zmax)
File "/afs/mpa/home/minh/.local/lib/python2.7/site-packages/astropy/cosmology/core.py", line 1195, in comoving_distance
return self._comoving_distance_z1z2(0, z)
File "/afs/mpa/home/minh/.local/lib/python2.7/site-packages/astropy/cosmology/core.py", line 1219, in _comoving_distance_z1z2
return self._hubble_distance * vectorize_if_needed(f, z1, z2)
File "/afs/mpa/home/minh/.local/lib/python2.7/site-packages/astropy/units/quantity.py", line 924, in __mul__
return super(Quantity, self).__mul__(other)
File "/afs/mpa/home/minh/.local/lib/python2.7/site-packages/astropy/units/quantity.py", line 368, in __array_prepare__
from .quantity_helper import UNSUPPORTED_UFUNCS, UFUNC_HELPERS
MemoryError
Upvotes: 2
Views: 383
Reputation:
As an alternative to my other suggested solution, you can split the input file, iterate over the new set of (temporary) input files, and concatenate the input files at the end. Below is a bash wrapper script that, on the outside, should work identically to the Python script in the question (one input file argument, one output file argument).
#! /bin/bash
nlines=10000
input=$1
output=$2
# use a unique prefix!
prefix='tmpsplit'
split --lines=$nlines $input $prefix
outfiles=()
# Assume we only split to a maximum of 26^2 files
# This is the default for split anyway
for filename in ${prefix}??
do
outfile="${filename}-out"
./calcdist.py $filename $outfile
done
# This assumes the shells orders the glob expansion alphabetically
cat ${prefix}*out > $output
# Clean up
rm ${prefix}*
You may want to use a temporary directory, instead of relying on a unique prefix.
Upvotes: 1
Reputation:
I don't know any solution intrinsic to numpy, but you can save some memory allocation by writing each solution promptly to file, and not after the for loop. That saves the memory allocation for Redshift
and the memory allocation done behind the scenes when numpy.savetxt()
formats floating points to string.
inFile=sys.argv[1]
outFile=sys.argv[2]
comovingDistance = numpy.loadtxt(inFile, usecols=(2,))
with open(outFile, 'w') as fp:
for distance in comovingDistance:
fp.write("{:1.8e}\n".format(
z_at_value(cosmo.comoving_distance, distance * unit.kpc)))
(NB: untested)
Upvotes: 1