Woj
Woj

Reputation: 469

What is the best way to find the index of two vectors whose values are closest to a 2D point

I will try to explain my goal the best way possible.

Let's say I have two vectors:

Teffs <- c(6000, 6100, 6200, ..., 7500)
Ls <- c(40, 41, 42, 43, 44, ..., 60)

I want to find what elements in the vectors give values closest to the point (6199, 42.1). The index of Teffs MUST be the same as the index in Ls, otherwise I could easily index each vector individually using something like:

values <- c(6199, 42.1)
indexa <- which.min(abs(Teffs - values[1]))
indexb <- which.min(abs(Ls - values[2]))
indexa
3
indexb
3

In this scenario, it is obvious the third elements in the vector give the closest values to the desired point. However, what if it was more ambigous? What if the point I wanted to find indices closest to was (6200, 62) or (6800, 59) or even (6000, 60)? How would I go about this while maintaining that the index in Teffs is the same as in Ls?

Upvotes: 0

Views: 54

Answers (2)

Bruno
Bruno

Reputation: 4150

One way is that you first scale the vectors then you calculate whatever distance measure you want

This might be one of the most convoluted codes that I have ever done but it should work nicely if you wrap the logic in a function

library(tidyverse)
library(recipes)
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#> 
#>     fixed
#> The following object is masked from 'package:stats':
#> 
#>     step
library(philentropy)

Teffs <- c(6000, 6100, 6200, 7500)
Ls <- c(40, 41, 42, 43)
obs <- tibble(Teffs = 6199,Ls = 42.1)


df_measures <- tibble(Teffs = Teffs,Ls = Ls)

recipe_to_scale <- recipe(x = df_measures) %>%
  step_scale(everything()) %>% 
  prep()

data_for_dist <- recipe_to_scale %>% 
  juice()




baked_obs <- bake(recipe_to_scale,new_data = obs)

data_set_to_dist <- data_for_dist %>% 
  bind_rows(baked_obs)

resulting_dist <- distance(data_set_to_dist) %>% 
  tail(1)
#> Metric: 'euclidean'; comparing: 5 vectors.

position_result <- resulting_dist[1:(length(resulting_dist)-1)] %>% 
  which.min()

df_measures[position_result,]
#> # A tibble: 1 x 2
#>   Teffs    Ls
#>   <dbl> <dbl>
#> 1  6200    42

Created on 2021-01-28 by the reprex package (v0.3.0)

Upvotes: 0

dcarlson
dcarlson

Reputation: 11076

Since the scales are different (L, log10 and Teff, linear), this should work:

Teff <- c(6000, 6100, 6200, 7500)
L <- c(40, 41, 42, 43)
dst <- sqrt((log10(Teff) - log10(6199))^2 + (L - 42.1)^2)
which.min(dst)
# [1] 3

This adjusts the scales but the range of each variable could still be a problem since L about 10 times larger than log10(Teff). That might suggest using L/10.

Upvotes: 1

Related Questions