Slater Oden
Slater Oden

Reputation: 1

Best way to read and write CSV files utilizing astroquery to obtain large data sets

from astroquery.mast import Catalogs
import numpy as np
from astropy.io import ascii

myfile='/Users/slaterjonesoden/Desktop/GALEX_analysis/RQE_sample_data.csv'
sample = ascii.read(myfile, format='csv', delimiter=',', guess=True)
galexMatchCatalog=[]
iteration = 1

for each_galaxy in sample:
    catalogData = Catalogs.query_object(str(each_galaxy['RAgal'])+str(' ')+str(each_galaxy['DECgal']), catalog="Galex")
    print(iteration)
    iteration += 1
    if iteration > 2:
       break
    if len(catalogData)!=0:
        sdss_info = [each_galaxy['RAgal'], each_galaxy['DECgal']]
        galexMatchCatalog.append(list(np.array(catalogData)[0])+sdss_info)

header = catalogData.colnames + ['sdss_ra', 'sdss_dec']

ascii.write(galexMatchCatalog, '/Users/slaterjonesoden/Desktop/GALEX_analysis_codes/172_RQEs_galex_mast_match.csv', format='csv', names=header, overwrite=True)

I am trying to get this code to match 172 galaxies on a CSV file currently on my computer with galaxies in the astroquery.mast module using the Catalogs query function.

The catalog of interest in astroquery.mast is GALEX (Galaxy Evolution Explorer). Essentially, I want the code to loop through the 172 galaxies on my CSV file and match them (using RA and DEC) with the galaxies storied in the GALEX catalog in astroqury.mast.

After matching these galaxies, I then want to write a new CSV file with the data from GALEX of these galaxies.

My first stab at this was defining a writeCsvFile() function, but this did not work properly to write a csv file.

My next try was importing ascii from astropy.io and using the ascii.read() and ascii.write() functions to read and write these CSV files. At first I thought I was in luck as the for loop was working, but after going through the for loop the ascii.write() function wasn't working properly. Below is the error message I get when running the code:

error message I get when running the code ascii version of the code

The important error line in pic above: ValueError: Arguments "names" and "dtype" must match number of columns

Anyone with experience in using astroquery.mast and reading/writing CSV files would be of help.

I am running this code using Python 3.6 with the interpreter astroconda3

Here is a picture of the code as well: 172_RQEs_GALEX_mast_match.py

Upvotes: 0

Views: 623

Answers (1)

Iguananaut
Iguananaut

Reputation: 23346

I think I see what your error is. When you construct galexMatchCatalog you are creating a list of row data containing the first row from catalogData plus your two coordinates [RAgal, RAdec] which you want to append to the row as additional columns.

Then you pass ascii.write a list of lists of row-wise data.

Actually this is a bit counter-intuitive, but if you pass ascii.write a list, it assumes it is a list of columns not rows, so it blows up since the number of rows does not match the number of columns, of course. Maybe the error message could be more useful here.

You can see for example the first Example from the documentation for ascii.write shows that it's passed a list of columns (this is for efficiency, since it's more efficient, typically, to store data column-wise for the sake of column-oriented operations).

In fact, if you pass ascii.write something other than an astropy Table it will try to construct a Table from the first argument (you can see this in the traceback on the line that says table = Table(table, names=names).

Likewise when Constructing a Table it interprets a list as a list of columns. To pass it a list of rows you can do something like:

>>> table = Table(rows=galexMatchCatalog, names=header)
>>> table.write(filename, format='ascii.csv')

More generally though here's how I might do this (though there are many ways).

With the sample you read from your CSV file you already have an astropy Table object containing RAgal and DECgal columns. You could make a sub-table containing just those columns like:

coords = sample[['RAgal', 'DECgal']]

If you want you can also rename the columns according to what you want in your final output:

coords.rename_columns(['RAgal', 'DECgal'], ['sdss_ra', 'sdss_dec'])

Now you want to loop over all the coordinate pairs and query the catalog, and build up a list of rows from the query results, including the coordinates you used to look them up. Again, there are many ways you could do this some more efficient than others, but one way is to use hstack and vstack:

from astropy.table import hstack, vstack

galex_match_catalog = []

for galaxy_coords in coords:
    catalog_data = Catalogs.query_object(f'{galaxy_coords["sdss_ra"]} {galaxy_coords["sdss_dec"]}', catalog='Galex')
    if catalog_data:
        galex_match_catalog.append(hstack([catalog_data[0], galaxy_coords]))

# Finally, write:

galaxy_match_catalog = vstack(galaxy_match_catalog)
galaxy_match_catalog.write(filename, format='ascii.csv')

Upvotes: 0

Related Questions