Aina
Aina

Reputation: 653

How to open a remote .FTS.gz file with astropy.io.fits.open()?

Summary of a problem:

I am writing some code that checks the content of a FTS file header (data saved from a telescope) using astropy.io.fits. My problem is when I try to open .FTS.gz files instead of .FTS files on a remote server. When I open() a .FTS.gz I get errors, if I gunzip the .FTS.gz file, all is good. One of the errors suggest I have an END missing card. Searching online, I used a suggestion of using the ignore_missing_end=True argument in fits.open(), but then I get the next error. This next error suggests my FITS file is empty or corrupt, however it is not the case. I can open it with SAOImage DS9 without any problems, plus I have run this handy online tool called fitsverify which reports no errors in my file. If I download the offending file .FTS.gz and run a similar code to fits.open() this file locally, I get no errors at all. An example of an offending file (used in the code below) is now uploaded here.

The Astropy documentation says: "Working with compressed files The open() function will seamlessly open FITS files that have been compressed with gzip, bzip2 or pkzip. Note that in this context we’re talking about a fits file that has been compressed with one of these utilities - e.g. a .fits.gz file."

How do I open a remote .FTS.gz file without downloading it? I have hundreds of thousands of files like this, so downloading is not an option and it is not just one file that gives a problem, it is all of them.

Thanks, Aina.

Code and errors:

CODE TO OPEN A REMOTE .FTS.gz FILE:

from astropy.io import fits
import paramiko

client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.load_system_host_keys()
client.connect('myhostname', username='myusername', password='mypassword')
apath = '/path/to/folder/to/search'
apattern = '"RUN0001.FTS.gz"'
rawcommand = 'find {path} -name {pattern}'
command = rawcommand.format(path=apath, pattern=apattern)
stdin, stdout, stderr = client.exec_command(command)
filelist = stdout.read().splitlines()
for i in filelist:
    sftp_client = client.open_sftp()
    remote_file = sftp_client.open(i)
    hdulist = fits.open(remote_file)
client.close()

ERROR:

Traceback (most recent call last):
  File "/Users/amusaeva/Documents/PyCharm/FITSHeaders/stackoverflow.py", line 17, in <module>
    hdulist = fits.open(remote_file)
  File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 166, in fitsopen
    lazy_load_hdus, **kwargs)
  File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 404, in fromfile
    lazy_load_hdus=lazy_load_hdus, **kwargs)
  File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 1040, in _readfrom
    read_one = hdulist._read_next_hdu()
  File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 1135, in _read_next_hdu
    hdu = _BaseHDU.readfrom(fileobj, **kwargs)
  File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/base.py", line 329, in readfrom
    **kwargs)
  File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/base.py", line 394, in _readfrom_internal
    header = Header.fromfile(data, endcard=not ignore_missing_end)
  File "/Library/Python/2.7/site-packages/astropy/io/fits/header.py", line 450, in fromfile
    padding)[1]
  File "/Library/Python/2.7/site-packages/astropy/io/fits/header.py", line 519, in _from_blocks
    raise IOError('Header missing END card.')
IOError: Header missing END card.

Process finished with exit code 1

CHANGING THE CODE ABOVE FOR ONE LINE ONLY:

hdulist = fits.open(remote_file, ignore_missing_end=True)

ERROR:

WARNING: VerifyWarning: Error validating header for HDU #0 (note: Astropy uses zero-based indexing).
    Header size is not multiple of 2880: 7738429
There may be extra bytes after the last HDU or the file is corrupted. [astropy.io.fits.hdu.hdulist]
Traceback (most recent call last):
  File "/Users/amusaeva/Documents/PyCharm/FITSHeaders/stackoverflow.py", line 17, in <module>
    hdulist = fits.open(remote_file, ignore_missing_end=True)
  File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 166, in fitsopen
    lazy_load_hdus, **kwargs)
  File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 404, in fromfile
    lazy_load_hdus=lazy_load_hdus, **kwargs)
  File "/Library/Python/2.7/site-packages/astropy/io/fits/hdu/hdulist.py", line 1044, in _readfrom
    raise IOError('Empty or corrupt FITS file')
IOError: Empty or corrupt FITS file

Process finished with exit code 1

CODE TO OPEN THE OFFENDING .FTS.gz FILE LOCALLY PRODUCES NO ERRORS:

import os
from astropy.io import fits

folderTosearch = "/path/to/folder/to/search/locally";
for root, dirs, files in os.walk(folderTosearch):
    for file in files:
        if file.endswith("RUN0001.FTS.gz"):
            hdulist = fits.open(os.path.join(root, file))

Upvotes: 1

Views: 2317

Answers (1)

user707650
user707650

Reputation:

This happens because the sftp call passes some variant of a file-like object (which has a .read() method that fits.open() will use.
The file like object, however, is still a gzip file. Astropy checks whether a file is zipped only for file names, that is, when the argument to fits.open() is a string (that happens to be a path). Astropy does not appear to test for the magic bytes that identify a byte stream as a gzip file. Oddly enough, it does do this verification when path strings are passed. Arguably, this may be a slight shortcoming in the astropy.io.fits module, but perhaps there's a reason for it.
(Disclaimer: the above conclusion is from scanning quickly through the relevant source code; I may have missed something. Hopefully people will correct me if so.)

One solution is to do the unzipping yourself. I've cobbled up the following:

from cStringIO import StringIO
import zlib

<...>

for i in filelist:                                                                       
    sftp_client = client.open_sftp()                                                     
    remote_file = sftp_client.open(i)                                                    
    decompressed = StringIO(                                                             
        zlib.decompress(remote_file.read(), zlib.MAX_WBITS|32))                          
    hdulist = fits.open(decompressed)                                                    
    client.close()                                                                       

Above, we're reading the full contents of the remote file (remote_file.read(), then uncompressing the contents. That results in a string, so we wrap it in a StringIO instance to make it a file-like object again, that we can pass to fits.open(). (For the zlib.MAX_WBITS|32 argument: see this answer.)


Alternatively, you can sftp the file to local disk, and then read the file (with the local filename) locally. The above just keeps everything in memory.

Upvotes: 2

Related Questions