Reputation: 469
I will try to explain my goal the best way possible.
Let's say I have two vectors:
Teffs <- c(6000, 6100, 6200, ..., 7500)
Ls <- c(40, 41, 42, 43, 44, ..., 60)
I want to find what elements in the vectors give values closest to the point (6199, 42.1). The index of Teffs MUST be the same as the index in Ls, otherwise I could easily index each vector individually using something like:
values <- c(6199, 42.1)
indexa <- which.min(abs(Teffs - values[1]))
indexb <- which.min(abs(Ls - values[2]))
indexa
3
indexb
3
In this scenario, it is obvious the third elements in the vector give the closest values to the desired point. However, what if it was more ambigous? What if the point I wanted to find indices closest to was (6200, 62) or (6800, 59) or even (6000, 60)? How would I go about this while maintaining that the index in Teffs is the same as in Ls?
Upvotes: 0
Views: 54
Reputation: 4150
One way is that you first scale the vectors then you calculate whatever distance measure you want
This might be one of the most convoluted codes that I have ever done but it should work nicely if you wrap the logic in a function
library(tidyverse)
library(recipes)
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#>
#> fixed
#> The following object is masked from 'package:stats':
#>
#> step
library(philentropy)
Teffs <- c(6000, 6100, 6200, 7500)
Ls <- c(40, 41, 42, 43)
obs <- tibble(Teffs = 6199,Ls = 42.1)
df_measures <- tibble(Teffs = Teffs,Ls = Ls)
recipe_to_scale <- recipe(x = df_measures) %>%
step_scale(everything()) %>%
prep()
data_for_dist <- recipe_to_scale %>%
juice()
baked_obs <- bake(recipe_to_scale,new_data = obs)
data_set_to_dist <- data_for_dist %>%
bind_rows(baked_obs)
resulting_dist <- distance(data_set_to_dist) %>%
tail(1)
#> Metric: 'euclidean'; comparing: 5 vectors.
position_result <- resulting_dist[1:(length(resulting_dist)-1)] %>%
which.min()
df_measures[position_result,]
#> # A tibble: 1 x 2
#> Teffs Ls
#> <dbl> <dbl>
#> 1 6200 42
Created on 2021-01-28 by the reprex package (v0.3.0)
Upvotes: 0
Reputation: 11076
Since the scales are different (L, log10 and Teff, linear), this should work:
Teff <- c(6000, 6100, 6200, 7500)
L <- c(40, 41, 42, 43)
dst <- sqrt((log10(Teff) - log10(6199))^2 + (L - 42.1)^2)
which.min(dst)
# [1] 3
This adjusts the scales but the range of each variable could still be a problem since L about 10 times larger than log10(Teff). That might suggest using L/10.
Upvotes: 1