Reputation: 727
I'm trying to mutate a column and populate it with exact matches from a list if those occur, and if not, the closest match possible.
My data frame looks like this:
index <- seq(1, 10, 1)
blockID <- c(100, 120, 132, 133, 201, 207, 210, 238, 240, 256)
df <- as.data.frame(cbind(index, blockID))
index blockID
1 1 100
2 2 120
3 3 132
4 4 133
5 5 201
6 6 207
7 7 210
8 8 238
9 9 240
10 10 256
I want to mutate
a new column that checks whether blockID
is in a list. If yes, it should just keep the value of blockID
. If not, It should return the nearest value in blocklist
:
blocklist <- c(100, 120, 130, 150, 201, 205, 210, 238, 240, 256)
so the additional column should contain
100 (match),
120 (match),
130 (no match for 132--nearest value is 130),
130 (no match for 133--nearest value is 130),
201,
205 (no match for 207--nearest value is 205),
210,
238,
240,
256
Here's what I've tried:
df2 <- df %>% mutate(blockmatch = ifelse(blockID %in% blocklist, blockID, ifelse(match.closest(blockID, blocklist, tolerance = Inf), "missing")))
I just put in "missing"
to complete the ifelse()
statements, but it shouldn't actually be returned anywhere since the preceding cases will be fulfilled for every value of blockID
. However, the resulting df2 just has "missing" in all the cells where it should have substituted the nearest number. I know there are base R alternatives to match.closest
but I'm not sure that's the problem. Any ideas?
Upvotes: 0
Views: 1293
Reputation: 82
Here’s a fast and efficient dplyr
solution for this problem:
index <- seq(1, 10, 1)
blockID <- c(100, 120, 132, 133, 201, 207, 210, 238, 240, 256)
df <- as.data.frame(cbind(index, blockID))
blocklist <- c(100, 120, 130, 150, 201, 205, 210, 238, 240, 256)
df |>
dplyr::left_join(
dplyr::tibble(nearest_value_in_blocklist = blocklist),
by = dplyr::join_by(closest(blockID >= nearest_value_in_blocklist))
) |>
dplyr::mutate(
new_column = dplyr::if_else(
blockID %in% blocklist,
blockID,
nearest_value_in_blocklist
)
)
#> index blockID nearest_value_in_blocklist new_column
#> 1 1 100 100 100
#> 2 2 120 120 120
#> 3 3 132 130 130
#> 4 4 133 130 130
#> 5 5 201 201 201
#> 6 6 207 205 205
#> 7 7 210 210 210
#> 8 8 238 238 238
#> 9 9 240 240 240
#> 10 10 256 256 256
Created on 2024-11-23 with reprex v2.1.1
Upvotes: 0
Reputation: 11150
You don't need if..else
. Your rule can simplified by saying that we always get the blocklist
element with least absolute difference when compared to blockID
. If values match then absolute difference is 0 (which will always be the least).
With that here's a simple base R solution -
df$blockmatch <- sapply(df$blockID, function(x) blocklist[order(abs(x - blocklist))][1])
index blockID blockmatch
1 1 100 100
2 2 120 120
3 3 132 130
4 4 133 130
5 5 201 201
6 6 207 205
7 7 210 210
8 8 238 238
9 9 240 240
10 10 256 256
Here are a couple of ways with dplyr
-
df %>%
rowwise() %>%
mutate(
blockmatch = blocklist[order(abs(blockID - blocklist))][1]
)
df %>%
mutate(
blockmatch = sapply(blockID, function(x) blocklist[order(abs(x - blocklist))][1])
)
Thanks to @Onyambu, here's a faster way -
df$blockmatch <- blocklist[max.col(-abs(sapply(blocklist, '-', df$blockID)))]
Upvotes: 4