Reputation: 653
Summary of a problem:
I am writing some code that checks the content of a FTS file header (data saved from a telescope) using astropy.io.fits. My problem is when I try to open .FTS.gz files instead of .FTS files on a remote server. When I open()
a .FTS.gz I get errors, if I gunzip
the .FTS.gz file, all is good. One of the errors suggest I have an END missing card. Searching online, I used a suggestion of using the ignore_missing_end=True
argument in fits.open()
, but then I get the next error. This next error suggests my FITS file is empty or corrupt, however it is not the case. I can open it with SAOImage DS9 without any problems, plus I have run this handy online tool called fitsverify
which reports no errors in my file. If I download the offending file .FTS.gz and run a similar code to fits.open()
this file locally, I get no errors at all. An example of an offending file (used in the code below) is now uploaded here.
The Astropy documentation says:
"Working with compressed files
The open()
function will seamlessly open FITS files that have been compressed with gzip, bzip2 or pkzip. Note that in this context we’re talking about a fits file that has been compressed with one of these utilities - e.g. a .fits.gz file."
How do I open a remote .FTS.gz file without downloading it? I have hundreds of thousands of files like this, so downloading is not an option and it is not just one file that gives a problem, it is all of them.
Thanks, Aina.
Code and errors:
CODE TO OPEN A REMOTE .FTS.gz FILE:
from astropy.io import fits
import paramiko
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.load_system_host_keys()
client.connect('myhostname', username='myusername', password='mypassword')
apath = '/path/to/folder/to/search'
apattern = '"RUN0001.FTS.gz"'
rawcommand = 'find {path} -name {pattern}'
command = rawcommand.format(path=apath, pattern=apattern)
stdin, stdout, stderr = client.exec_command(command)
filelist = stdout.read().splitlines()
for i in filelist:
sftp_client = client.open_sftp()
remote_file = sftp_client.open(i)
hdulist = fits.open(remote_file)
client.close()
ERROR:
Traceback (most recent call last):
File "/Users/amusaeva/Documents/PyCharm/FITSHeaders/stackoverflow.py", line 17, in <module>
hdulist = fits.open(remote_file)
File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 166, in fitsopen
lazy_load_hdus, **kwargs)
File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 404, in fromfile
lazy_load_hdus=lazy_load_hdus, **kwargs)
File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 1040, in _readfrom
read_one = hdulist._read_next_hdu()
File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 1135, in _read_next_hdu
hdu = _BaseHDU.readfrom(fileobj, **kwargs)
File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/base.py", line 329, in readfrom
**kwargs)
File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/base.py", line 394, in _readfrom_internal
header = Header.fromfile(data, endcard=not ignore_missing_end)
File "/Library/Python/2.7/site-packages/astropy/io/fits/header.py", line 450, in fromfile
padding)[1]
File "/Library/Python/2.7/site-packages/astropy/io/fits/header.py", line 519, in _from_blocks
raise IOError('Header missing END card.')
IOError: Header missing END card.
Process finished with exit code 1
CHANGING THE CODE ABOVE FOR ONE LINE ONLY:
hdulist = fits.open(remote_file, ignore_missing_end=True)
ERROR:
WARNING: VerifyWarning: Error validating header for HDU #0 (note: Astropy uses zero-based indexing).
Header size is not multiple of 2880: 7738429
There may be extra bytes after the last HDU or the file is corrupted. [astropy.io.fits.hdu.hdulist]
Traceback (most recent call last):
File "/Users/amusaeva/Documents/PyCharm/FITSHeaders/stackoverflow.py", line 17, in <module>
hdulist = fits.open(remote_file, ignore_missing_end=True)
File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 166, in fitsopen
lazy_load_hdus, **kwargs)
File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 404, in fromfile
lazy_load_hdus=lazy_load_hdus, **kwargs)
File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 1044, in _readfrom
raise IOError('Empty or corrupt FITS file')
IOError: Empty or corrupt FITS file
Process finished with exit code 1
CODE TO OPEN THE OFFENDING .FTS.gz FILE LOCALLY PRODUCES NO ERRORS:
import os
from astropy.io import fits
folderTosearch = "/path/to/folder/to/search/locally";
for root, dirs, files in os.walk(folderTosearch):
for file in files:
if file.endswith("RUN0001.FTS.gz"):
hdulist = fits.open(os.path.join(root, file))
Upvotes: 1
Views: 2317
Reputation:
This happens because the sftp call passes some variant of a file-like object (which has a .read()
method that fits.open()
will use.
The file like object, however, is still a gzip file. Astropy checks whether a file is zipped only for file names, that is, when the argument to fits.open()
is a string (that happens to be a path). Astropy does not appear to test for the magic bytes that identify a byte stream as a gzip file. Oddly enough, it does do this verification when path strings are passed. Arguably, this may be a slight shortcoming in the astropy.io.fits module, but perhaps there's a reason for it.
(Disclaimer: the above conclusion is from scanning quickly through the relevant source code; I may have missed something. Hopefully people will correct me if so.)
One solution is to do the unzipping yourself. I've cobbled up the following:
from cStringIO import StringIO
import zlib
<...>
for i in filelist:
sftp_client = client.open_sftp()
remote_file = sftp_client.open(i)
decompressed = StringIO(
zlib.decompress(remote_file.read(), zlib.MAX_WBITS|32))
hdulist = fits.open(decompressed)
client.close()
Above, we're reading the full contents of the remote file (remote_file.read()
, then uncompressing the contents. That results in a string, so we wrap it in a StringIO
instance to make it a file-like object again, that we can pass to fits.open()
. (For the zlib.MAX_WBITS|32
argument: see this answer.)
Alternatively, you can sftp the file to local disk, and then read the file (with the local filename) locally. The above just keeps everything in memory.
Upvotes: 2