Reputation: 619
I am working with a data frame resembling the following:
I am looking to create columns that identify the
name
and x-location
value of the nearest x-location
for every observation within each Day
. So the above table would become:
Where closest_x
is the nearest x_location
value to that row's x-location
within the same day. closest_x_name
is the name of the closest person. second_closest_x
is the value of the second nearest x_location
within the column and so on. Note that Harry
still appears twice in the second day, but only appears once amongst the closest, second_closest, etc. columns...we keep all his observations, but only rank him once.
I figured one way to approach this would be to use which.min.
library(tidyverse)
dat1 <- dat %>%
group_by(Day) %>%
mutate(x_closest = which.min(abs(x_location -x_location)))
But such approach doesn't work for a number of reasons.
structure(list(Day = c("Monday", "Monday", "Monday", "Monday",
"Monday", "Tuesday", "Tuesday", "Tuesday", "Tuesday", "Tuesday",
"Tuesday"), Name = c("Steve", "Joe", "Brian", "Mike", "Harry",
"Steve", "Joe", "Brian", "Mike", "Harry", "Harry"), x_location = c(2.5,
6, 4.5, 8, 5, 1, 3.5, 4, 9.5, 10, 10), closest_x = c(4.5, 5,
5, 6, 4.5, 3.5, 4, 3.5, 10, 9.5, 9.5), closest_x_name = c("Brian",
"Harry", "Harry", "Joe", "Brian", "Joe", "Brian", "Joe", "Harry",
"Mike", "Mike"), second_closest_x = c(5, 4.5, 6, 5, 6, 4, 1,
1, 4, 4, 4), second_closest_name = c("Harry", "Brian", "Joe",
"Harry", "Joe", "Brian", "Steve", "Steve", "Brian", "Brian",
"Brian"), third_closest_x = c(6, 8, 2.5, 4.5, 2.5, 9.5, 9.5,
9.5, 3.5, 3.5, 3.5), third_closest_name = c("Joe", "Mike", "Steve",
"Brian", "Steve", "Mike", "Mike", "Mike", "Joe", "Joe", "Joe"
)), class = "data.frame", row.names = c(NA, -11L))
Upvotes: 2
Views: 92
Reputation: 17309
Here’s a tidyverse-based solution:
library(dplyr)
library(purrr)
library(tidyr)
closest_rows <- function(x_location_cur, Name_cur) {
closest <- cur_data()$Name[order(abs(cur_data()$x_location - x_location_cur))]
tibble(
dist = c("closest", "second_closest", "third_closest"),
name = unique(closest[closest != Name_cur])[1:3],
x = cur_data()$x_location[match(name, cur_data()$Name)]
)
}
dat %>%
group_by(Day) %>%
mutate(
row = row_number(),
dists = map2(x_location, Name, closest_rows)
) %>%
unnest(dists) %>%
pivot_wider(
names_from = dist,
values_from = name:x,
names_glue = "{dist}_{.value}",
names_vary = "slowest"
) %>%
ungroup() %>%
select(!row)
# A tibble: 11 × 9
Day Name x_location closest_name closest_x second_…¹ secon…² third…³ third…⁴
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 Monday Steve 2.5 Brian 4.5 Harry 5 Joe 6
2 Monday Joe 6 Harry 5 Brian 4.5 Mike 8
3 Monday Brian 4.5 Harry 5 Joe 6 Steve 2.5
4 Monday Mike 8 Joe 6 Harry 5 Brian 4.5
5 Monday Harry 5 Brian 4.5 Joe 6 Steve 2.5
6 Tuesday Steve 1 Joe 3.5 Brian 4 Mike 9.5
7 Tuesday Joe 3.5 Brian 4 Steve 1 Mike 9.5
8 Tuesday Brian 4 Joe 3.5 Steve 1 Mike 9.5
9 Tuesday Mike 9.5 Harry 10 Brian 4 Joe 3.5
10 Tuesday Harry 10 Mike 9.5 Brian 4 Joe 3.5
11 Tuesday Harry 10 Mike 9.5 Brian 4 Joe 3.5
# … with abbreviated variable names ¹second_closest_name, ²second_closest_x,
# ³third_closest_name, ⁴third_closest_x
Walkthrough:
For each row, the closest_rows()
helper function gets indices of other rows in the group ordered by distance; removes rows with duplicate names or with the same name as the current row; then returns a tibble with names and x locations for the three closest matches. This yields a list column, which you can unnest, then pivot wider to get your desired format.
Upvotes: 1