Woj
Woj

Reputation: 469

R: interpolate a value from dataframe based on two inputs

I have a data frame that looks like this:

  Teff logg M_div_H       U       B      V      R      I     J     H     K     L Lprime     M
1 2000  4.0    -0.1 -13.443 -11.390 -7.895 -4.464 -1.831 1.666 3.511 2.701 4.345  4.765 5.680
2 2000  4.5    -0.1 -13.402 -11.416 -7.896 -4.454 -1.794 1.664 3.503 2.728 4.352  4.772 5.687
3 2000  5.0    -0.1 -13.358 -11.428 -7.888 -4.431 -1.738 1.664 3.488 2.753 4.361  4.779 5.685
4 2000  5.5    -0.1 -13.220 -11.079 -7.377 -4.136 -1.483 1.656 3.418 2.759 4.355  4.753 5.638
5 2200  3.5    -0.1 -11.866  -9.557 -6.378 -3.612 -1.185 1.892 3.294 2.608 3.929  4.289 4.842
6 2200  4.5    -0.1 -11.845  -9.643 -6.348 -3.589 -1.132 1.874 3.310 2.648 3.947  4.305 4.939
...

Let's say I have two values:

input_Teff = 4.8529282904170595E+003
input_log_g = 1.9241934741026787E+000

Notice how every V value has a unique Teff, logg combination. From the input values, I would like to interpolate a value for V. Is there a way to do this in R?

Edit 1: Here is the link to the full data frame: https://www.dropbox.com/s/prbceabxmd25etx/lcb98cor.dat?dl=0

Upvotes: 0

Views: 215

Answers (2)

Robert Hijmans
Robert Hijmans

Reputation: 47686

Building on Ian Campbell's observation that you can consider your data as points on a two-dimensional plane, you can use spatial interpolation methods. The simplest approach is inverse-distance weighting, which you can implement like this

library(data.table) 
d <- fread("https://www.dropbox.com/s/prbceabxmd25etx/lcb98cor.dat?dl=1")
setnames(d,"#Teff","Teff")

First rescale the data as appropriate (not shown here, see Ian's answer)

library(gstat)
# fit model
idw <- gstat(id="V", formula = V~1, locations = ~Teff+logg, data=d, nmax=7, set=list(idp = .5))

# new "points" to predict to 
newd <- data.frame(Teff=c(4100, 4852.928), logg=c(1.5, 1.9241934741026787))

p <- predict(idw, newd)
#[inverse distance weighted interpolation]
p$V.pred
#[1] -0.9818571 -0.3602857

For higher dimensions you could use fields::Tps (I think you can force that to be an exact method, that is, exactly honor the observations, by making each observation a node)

Upvotes: 2

Ian Campbell
Ian Campbell

Reputation: 24888

We can imagine that Teff and logg exist in a 2-dimensional plane. We can see that your input point exists in that same space:

library(tidyverse)
ggplot(data,aes(x = Teff, y = logg)) +
  geom_point() +
  geom_point(data = data.frame(Teff = 4.8529282904170595e3, logg = 1.9241934741026787),
             color = "orange")

enter image description here

However, we can see the scale of Teff and logg are not the same. Simply taking log(Teff) gets us pretty close, but not quite. So we can rescale between 0 and 1 instead. We can create a custom rescale function. It will become clear why we can't use scales::rescale in a moment.

rescale = function(x,y){(x - min(y))/(max(y)-min(y))}

We can now rescale the data:

data %>% 
  mutate(Teff.scale = rescale(Teff,Teff),
         logg.scale = rescale(logg,logg)) -> data

From here, we might use raster::pointDistance to calculate the distance from the input point to all of the scaled values:

raster::pointDistance(cbind(rescale(input_Teff,data$Teff),rescale(input_log_g,data$logg)),
                      data[,c("Teff.scale","logg.scale")],
                      lonlat = FALSE)

We can use which.min to find the row with the minimum distance:

data[which.min(raster::pointDistance(cbind(rescale(input_Teff,data$Teff),rescale(input_log_g,data$logg)),
                                     data[,c("Teff.scale","logg.scale")],
                                     lonlat = FALSE)),]
   Teff logg M_div_H      U      B      V     R     I     J     H     K     L Lprime     M Teff.scale logg.scale
1: 4750    2    -0.1 -2.447 -1.438 -0.355 0.159 0.589 1.384 1.976 1.881 2.079  2.083 2.489 0.05729167  0.4631902

Here we can visualize the result:

ggplot(data,aes(x = Teff.scale, y = logg.scale)) +
  geom_point() +
  geom_point(data = data[which.min(raster::pointDistance(cbind(rescale(input_Teff,data$Teff),rescale(input_log_g,data$logg)),data[,c("Teff.scale","logg.scale")], FALSE)),],
             color = "blue") +
  geom_point(data = data.frame(Teff.scale = rescale(input_Teff,data$Teff),logg.scale = rescale(input_log_g,data$logg)),
             color = "orange")

enter image description here

And access the appropriate value for V:

data[which.min(raster::pointDistance(cbind(rescale(input_Teff,data$Teff),rescale(input_log_g,data$logg)),data[,c("Teff.scale","logg.scale")], FALSE)),"V"]
        V
1: -0.355

Data:

library(data.table)
data <- fread("https://www.dropbox.com/s/prbceabxmd25etx/lcb98cor.dat?dl=1")
setnames(data,"#Teff","Teff")
input_Teff = 4.8529282904170595E+003
input_log_g = 1.9241934741026787E+000

Upvotes: 2

Related Questions