Kazi Abu Rousan
Kazi Abu Rousan

Reputation: 1

How to access star RA and dec from Gaia data source in R?

I am a beginner to R. For one of my project I was trying to find the position of stars in R language. The data are available in Gaia data archive and can be downloaded. Now, the thing is in Python I can import the data directly in Python code (https://astroquery.readthedocs.io/en/latest/gaia/gaia.html), is it possible to do the same in R?

I am currently running this code:

library(httr)
library(jsonlite)
library(xml2)

get_gaia_star_data <- function(star_name) {
  # URL encode the star name for use in the query
  star_name_encoded <- URLencode(star_name, reserved = TRUE)

  # Construct the Gaia TAP query to find the star by name
  query <- paste0(
    "SELECT TOP 1 source_id, ra, dec ",
    "FROM gaiadr2.gaia_source ",
    "WHERE CONTAINS(POINT('ICRS', ra, dec), ",
    "CIRCLE('ICRS', 0, 0, 180)) = 1 AND ",
    "source_id IN (SELECT source_id FROM gaiadr2.gaia_source WHERE ",
    "CONTAINS(POINT('ICRS', ra, dec), ",
    "CIRCLE('ICRS', ", star_name_encoded, ")) = 1"
  )

  # URL for the Gaia Archive TAP service
  base_url <- "https://gea.esac.esa.int/tap-server/tap"

  # Construct the request body
  body <- list(
    REQUEST = "doQuery",
    LANG = "ADQL",
    FORMAT = "json",
    PHASE = "RUN",
    QUERY = query
  )

  # Perform the query
  response <- tryCatch({
    POST(paste0(base_url, "/sync"), body = body, encode = "form")
  }, error = function(e) {
    stop("Failed to connect to Gaia Archive. Error: ", e$message)
  })

  # Check if the request was successful
  if (status_code(response) != 200) {
    stop(paste("Failed to query Gaia Archive. HTTP status code:", status_code(response)))
  }

  # Parse the response content
  content <- content(response, "text", encoding = "UTF-8")
  data <- tryCatch({
    fromJSON(content)
  }, error = function(e) {
    stop("Failed to parse JSON response.")
  })

  # Check if the response contains data
  if (length(data$data) == 0) {
    stop("No results found for the star name.")
  }

  # Extract RA and Dec from the response
  ra <- as.numeric(data$data[[1]]$ra)
  dec <- as.numeric(data$data[[1]]$dec)

  # Return the coordinates as a list
  return(list(
    RA = ra,
    Dec = dec
  ))
}

# Example usage
star_name <- "Sirius"
star_location <- get_gaia_star_data(star_name)
print(star_location)

but the HTTP status is 400 no matter what I do.

Upvotes: 0

Views: 104

Answers (2)

Kazi Abu Rousan
Kazi Abu Rousan

Reputation: 1

I have figured it out and also thanks to @Andre for his help. I have tried to keep it as similar as possible as we see in the case of python. Here is the code :

get_gaia_data <- function(query, col_n) {
  # URL for the Gaia Archive TAP service
  base_url <- "https://gea.esac.esa.int/tap-server/tap/sync"

  # Construct the request body
  body <- list(
    REQUEST = "doQuery",
    LANG = "ADQL",
    FORMAT = "json",
    PHASE = "RUN",
    QUERY = query
  )

  # Perform the query
  response <- tryCatch({
    POST(base_url, body = body, encode = "form")
  }, error = function(e) {
    stop("Failed to connect to Gaia Archive. Error: ", e$message)
  })

  # Check if the request was successful
  if (status_code(response) != 200) {
    stop(paste("Failed to query Gaia Archive. HTTP status code:", status_code(response)))
  }

  # Parse the response content
  content <- content(response, "text", encoding = "UTF-8")
  data <- tryCatch({
    fromJSON(content, flatten = TRUE)
  }, error = function(e) {
    stop("Failed to parse JSON response.")
  })

  # Check if the response contains data
  if (length(data$data) == 0) {
    stop("No results found for the query.")
  }

  # Convert to data frame and set correct column names
  df <- as.data.frame(data$data)
  colnames(df) <- col_n

  return(df)
}

This code takes two argument. The first one is the ADQL string and the second one for the data column name. Here is an working example for HR diagram data.

query <- paste0(
  "SELECT source_id, ra, dec, phot_bp_mean_mag, phot_rp_mean_mag, phot_g_mean_mag, parallax ",
  "FROM external.gaiaedr3_gcns_main_1 ",
  "WHERE parallax >50"
)
# Example usage: Query Gaia data around a specific RA, Dec, and radius, then plot the HR diagram
col_n <- c("source_id", "ra", "dec", "phot_bp_mean_mag", "phot_rp_mean_mag", "phot_g_mean_mag", "parallax")

gaia_data <- get_gaia_data(query, col_n)
print(head(gaia_data))

The example ADQL is taken from : See the first talk. If you plot as the rules you will find same plot.

Upvotes: 0

Andre Wildberg
Andre Wildberg

Reputation: 19163

I don't think there is an R astroquery equivalent similar to python, but here's an approach that's equally simple (imho).

Set the url's

root_url <- "http://cdn.gea.esac.esa.int/Gaia/gdr2/gaia_source_with_rv/csv/"

files_csv <- c("GaiaSource_1584380076484244352_2200921635402776448.csv.gz", 
               "GaiaSource_2200921875920933120_3650804325670415744.csv.gz", 
               "GaiaSource_2851858288640_1584379458008952960.csv.gz", 
               "GaiaSource_3650805523966057472_4475721411269270528.csv.gz", 
               "GaiaSource_4475722064104327936_5502601461277677696.csv.gz", 
               "GaiaSource_5502601873595430784_5933051501826387072.csv.gz", 
               "GaiaSource_5933051914143228928_6714230117939284352.csv.gz", 
               "GaiaSource_6714230465835878784_6917528443525529728.csv.gz")

To load the desired columns into dat, a collection of tibbles (data.frame) using read_csv with col_select within a sapply

library(readr) # also loads library(curl), install if not present, otherwise the
               # download timeout with 'url' might be too small

dat <- sapply(paste0(root_url, files_csv), \(x) 
  read_csv(x, progress=F, col_select=c("source_id", "ra", "dec")), simplify=F)

output

dat
$`http://cdn.gea.esac.esa.int/Gaia/gdr2/gaia_source_with_rv/csv/GaiaSource_1584380076484244352_2200921635402776448.csv.gz`
# A tibble: 1,000,000 × 3
   source_id    ra   dec
       <dbl> <dbl> <dbl>
 1   1.58e18  183.  63.5
 2   1.58e18  183.  63.7
 3   1.58e18  183.  63.8
 4   1.58e18  184.  63.6
 5   1.58e18  183.  63.7
 6   1.58e18  183.  63.8
 7   1.58e18  183.  63.8
 8   1.58e18  183.  63.9
 9   1.58e18  183.  63.9
10   1.58e18  183.  63.9
# ℹ 999,990 more rows
# ℹ Use `print(n = ...)` to see more rows

$`http://cdn.gea.esac.esa.int/Gaia/gdr2/gaia_source_with_rv/csv/GaiaSource_2200921875920933120_3650804325670415744.csv.gz`
# A tibble: 1,000,000 × 3
   source_id    ra   dec
       <dbl> <dbl> <dbl>
 1   2.20e18  339.  60.7
 2   2.20e18  339.  60.7
...

This loads the columns c("source_id", "ra", "dec") into dat that can be accesses via e.g. dat[[1]] or its name dat$'http://cdn.gea.esac.esa.int/Ga...csv.gz'.

If that's too large you can work with your data directly within the sapply and only save the filtered data.

Data from GaiaServer

Upvotes: 0

Related Questions