Reputation: 3
I have a problem which ought to be trivial but seems to have been massively over-complicated by the column-based nature of FITS BinTableHDU.
The script I'm writing should be trivial: iterate through a FITS file and write a subset of rows to an identically formatted FITS file, reducing the row count from c700k/3.6GB to about 350 rows. I have processed the input file and have each row that I want to save in a python array of FITS records:
outarray = []
self.indata=Table.read(self.infile, hdu=1)
for r in self._indata:
RecPassesFilter = FilterProc(r, self)
#
# Add to output array only if passes all filters...
#
if RecPassesFilter:
outarray.append(r)
Now, I've created an empty BintableHDU with exactly the same columns and formats and I want to add the filtered data:
[...much omitted code later...}
mycols = []
for inputcol in self._coldefs:
mycols.append(fits.Column(name=inputcol.name, format=inputcol.format))
# Next line should produce an empty BinTableHDU in the identical format to the output data
SaveData = fits.BinTableHDU.from_columns(mycols)
for s in self._outdata:
SaveData.data.append(s)
Now that last line not only fails, but every variant of it (SaveData.append() or .add_row() or whatever) also fails with a "no such method" error. There seems to be a singular lack of documentation on how to do the trivial task of adding a record. Clearly I am missing something, but two days later I'm still drawing a blank.
Can anyone point me in the right direction here?
Upvotes: 0
Views: 1519
Reputation: 23306
I just recalled that BinTableHDU.from_columns
takes an nrows
argument. If you pass that along with the columns of an existing table HDU, it will copy the column structure but initialize subsequent rows with empty data:
>>> hdul = fits.open('astropy/io/fits/tests/data/table.fits')
>>> table = hdul[1]
>>> table.columns
ColDefs(
name = 'target'; format = '20A'
name = 'V_mag'; format = 'E'
)
>>> table.data
FITS_rec([('NGC1001', 11.1), ('NGC1002', 12.3), ('NGC1003', 15.2)],
dtype=(numpy.record, [('target', 'S20'), ('V_mag', '>f4')]))
>>> new_table = fits.BinTableHDU.from_columns(table.columns, nrows=5)
>>> new_table.columns
ColDefs(
name = 'target'; format = '20A'
name = 'V_mag'; format = 'E'
)
>>> new_table.data
FITS_rec([('NGC1001', 11.1), ('NGC1002', 12.3), ('NGC1003', 15.2),
('', 0. ), ('', 0. )],
dtype=(numpy.record, [('target', 'S20'), ('V_mag', '<f4')]))
As you can see, this still copies the data from the original columns. I think the idea behind this originally was for adding new rows to an existing table. However, you can also initialize a completely empty new table by passing fill=True
:
>>> new_table_zeroed = fits.BinTableHDU.from_columns(table.columns, nrows=5, fill=True)
>>> new_table_zeroed.data
FITS_rec([('', 0.), ('', 0.), ('', 0.), ('', 0.), ('', 0.)],
dtype=(numpy.record, [('target', 'S20'), ('V_mag', '<f4')]))
Upvotes: 0
Reputation: 3
OK, I managed to resolve this with some brute force and nested iterations essentially to create column data arrays on the fly. It's not much in terms of code and I don't care that it's inefficient as I won't need to run it too often. Example code here:
with fits.open(self._infile) as HDUSet:
tableHDU=HDUSet[1]
self._coldefs = tableHDU.columns
FITScols = []
for inputcol in self._coldefs:
NewColData = []
for r in self._outdata:
NewColData.append(r[inputcol.name])
FITScols.append(fits.Column(name=inputcol.name, format=inputcol.format, array=NewColData))
SaveData = fits.BinTableHDU.from_columns(FITScols)
SaveData.writeto(fname)
This solves my problem for a 350 row subset. I haven't yet dared try it for the 250K row subset that I need for the next part of my project!
Upvotes: 0