Alexa Fredston
Alexa Fredston

Reputation: 727

dplyr mutate column with nearest value in external list

I'm trying to mutate a column and populate it with exact matches from a list if those occur, and if not, the closest match possible.

My data frame looks like this:

index <- seq(1, 10, 1)
blockID <- c(100, 120, 132, 133, 201, 207, 210, 238, 240, 256)
df <- as.data.frame(cbind(index, blockID))

   index blockID
1      1     100
2      2     120
3      3     132
4      4     133
5      5     201
6      6     207
7      7     210
8      8     238
9      9     240
10    10     256

I want to mutate a new column that checks whether blockID is in a list. If yes, it should just keep the value of blockID. If not, It should return the nearest value in blocklist:

blocklist <- c(100, 120, 130, 150, 201, 205, 210, 238, 240, 256) 

so the additional column should contain

100 (match), 
120 (match), 
130 (no match for 132--nearest value is 130), 
130 (no match for 133--nearest value is 130), 
201, 
205 (no match for 207--nearest value is 205), 
210, 
238, 
240, 
256 

Here's what I've tried:

df2 <- df %>% mutate(blockmatch = ifelse(blockID %in% blocklist, blockID, ifelse(match.closest(blockID, blocklist, tolerance = Inf), "missing")))

I just put in "missing" to complete the ifelse() statements, but it shouldn't actually be returned anywhere since the preceding cases will be fulfilled for every value of blockID. However, the resulting df2 just has "missing" in all the cells where it should have substituted the nearest number. I know there are base R alternatives to match.closest but I'm not sure that's the problem. Any ideas?

Upvotes: 0

Views: 1293

Answers (2)

Daniel Vartanian
Daniel Vartanian

Reputation: 82

Here’s a fast and efficient dplyr solution for this problem:

index <- seq(1, 10, 1)
blockID <- c(100, 120, 132, 133, 201, 207, 210, 238, 240, 256)
df <- as.data.frame(cbind(index, blockID))

blocklist <- c(100, 120, 130, 150, 201, 205, 210, 238, 240, 256)

df |>
  dplyr::left_join(
    dplyr::tibble(nearest_value_in_blocklist = blocklist), 
    by = dplyr::join_by(closest(blockID >= nearest_value_in_blocklist))
  ) |>
  dplyr::mutate(
    new_column = dplyr::if_else(
      blockID %in% blocklist, 
      blockID, 
      nearest_value_in_blocklist
    )
  )
#>    index blockID nearest_value_in_blocklist new_column
#> 1      1     100                        100        100
#> 2      2     120                        120        120
#> 3      3     132                        130        130
#> 4      4     133                        130        130
#> 5      5     201                        201        201
#> 6      6     207                        205        205
#> 7      7     210                        210        210
#> 8      8     238                        238        238
#> 9      9     240                        240        240
#> 10    10     256                        256        256

Created on 2024-11-23 with reprex v2.1.1

Upvotes: 0

Shree
Shree

Reputation: 11150

You don't need if..else. Your rule can simplified by saying that we always get the blocklist element with least absolute difference when compared to blockID. If values match then absolute difference is 0 (which will always be the least).

With that here's a simple base R solution -

df$blockmatch <- sapply(df$blockID, function(x) blocklist[order(abs(x - blocklist))][1])

   index blockID blockmatch
1      1     100        100
2      2     120        120
3      3     132        130
4      4     133        130
5      5     201        201
6      6     207        205
7      7     210        210
8      8     238        238
9      9     240        240
10    10     256        256

Here are a couple of ways with dplyr -

df %>% 
  rowwise() %>% 
  mutate(
    blockmatch = blocklist[order(abs(blockID - blocklist))][1]
  )

df %>% 
  mutate(
    blockmatch = sapply(blockID, function(x) blocklist[order(abs(x - blocklist))][1])
  )

Thanks to @Onyambu, here's a faster way -

df$blockmatch <- blocklist[max.col(-abs(sapply(blocklist, '-', df$blockID)))]

Upvotes: 4

Related Questions