Reputation: 11
I'm trying to add a column to the below dataset that denotes the lowest distance (in feet, which is the measurement currently represented) between a member (Player) of the group (ChanceId) and the other nine members of the group (ChanceId being the group). Basically, how far away is the nearest group member. I need this done for each member of a given group, and then need it looped over thousands of groups (preferably all using dplyr).
Thanks!!
ChanceId <- c("AD885857-5C31-575C-9963-B4D88E320451", "AD885857-5C31-575C-9963-B4D88E320451",
"AD885857-5C31-575C-9963-B4D88E320451", "AD885857-5C31-575C-9963-B4D88E320451",
"AD885857-5C31-575C-9963-B4D88E320451", "AD885857-5C31-575C-9963-B4D88E320451",
"AD885857-5C31-575C-9963-B4D88E320451", "AD885857-5C31-575C-9963-B4D88E320451",
"AD885857-5C31-575C-9963-B4D88E320451", "AD885857-5C31-575C-9963-B4D88E320451",
"AC66722F-2813-5D6D-ABF2-9F6F9834A604", "AC66722F-2813-5D6D-ABF2-9F6F9834A604",
"AC66722F-2813-5D6D-ABF2-9F6F9834A604", "AC66722F-2813-5D6D-ABF2-9F6F9834A604",
"AC66722F-2813-5D6D-ABF2-9F6F9834A604", "AC66722F-2813-5D6D-ABF2-9F6F9834A604",
"AC66722F-2813-5D6D-ABF2-9F6F9834A604", "AC66722F-2813-5D6D-ABF2-9F6F9834A604",
"AC66722F-2813-5D6D-ABF2-9F6F9834A604", "AC66722F-2813-5D6D-ABF2-9F6F9834A604")
Player <- c( "Robert Covington", "Mason Plumlee", "Jusuf Nurkic", "Ish Smith", "Gordon Hayward",
"Norman Powell", "Miles Bridges", "Anfernee Simons", "C.J. McCollum", "Kelly Oubre Jr.",
"Patrick Beverley", "Jarred Vanderbilt", "Karl-Anthony Towns", "Bam Adebayo", "P.J. Tucker Jr.",
"D'Angelo Russell", "Anthony Edwards", "Duncan Robinson", "Kyle Lowry", "Jimmy Butler")
X_RimLocation <- c(-39.77, -40.64, -40.10, -29.98, -34.16, -32.91, -44.77, -37.43, -9.60, -18.24,
-37.39, -39.00, -28.75, -30.88, -27.56, -10.09, -20.64, -38.04, -38.29, -12.51)
Y_RimLocation <- c(-7.97, 4.19, 1.51, 18.03, -22.16, 4.39, 0.33, 3.45, -9.68, -10.78, -19.26, 14.25,
10.11, 7.05, -8.54, 2.64, -16.71, 11.13, -12.34, 3.89)
data <- data.frame(ChanceId, Player, X_RimLocation, Y_RimLocation)
Upvotes: 0
Views: 61
Reputation: 66425
We could do this by joining the data to itself, calculating the distances for each player pairing in each chanceId, and select the minimum.
In this case, since we know there are only 10 players per play, the joined data will only be 10x as big as the original, so for data with only "thousands" of groups this should be pretty performant.
library(dplyr)
library(ggplot2) # to show answer graphically
dists <- data %>%
left_join(data, by = "ChanceId") %>%
filter(Player.x != Player.y) %>%
mutate(dist = sqrt((X_RimLocation.x - X_RimLocation.y)^2 +
(Y_RimLocation.x - Y_RimLocation.y)^2)) %>%
group_by(ChanceId, Player.x) %>%
summarize(min_dist = min(dist), .groups = "drop")
data %>%
left_join(dists, by = c("ChanceId", "Player" = "Player.x")) %>%
ggplot(aes(X_RimLocation, Y_RimLocation, color = min_dist)) +
geom_point() +
geom_text(aes(label = Player), hjust = 0) +
facet_wrap(~ChanceId)
Upvotes: 1