Reputation: 101
I am working with data that looks like this:
# A tibble: 192,988 x 3
GameID EventID Route
<int> <int> <chr>
1 2793 10 NULL
2 2793 10 NULL
3 2793 10 NULL
4 2793 10 NULL
5 2793 10 NULL
6 2793 10 NULL
7 2793 15 Fade - Back Shoulder
8 2793 15 Curl
9 2793 15 Go/Fly
10 2793 15 Out
# … with 192,978 more rows
The GameID
correlates to a specific NFL game, and the EventID
is a single play within that NFL game. The Route
variable is the routes that each WR ran on that specific EventID
.
For example: EventID
10 was likely a running play, as no WR routes were charted.
However, in EventID
15, there were four wide receivers on the field. And, as listed, they ran a 'Fade - Back Shoulder', a 'Curl', a 'Go/Fly', and an 'Out.'
What I am needing help with: I am trying to figure out how to find all the unique combinations of the routes ran. So the four listed above in the example data would be a unique combination and then so on and so forth.
So far I have had no luck figuring this out on my own, so I am hoping to find answer here on how to proceed with this.
EDIT
I would like the final outcome to look like this:
Combo.Number Routes
1 Fade - Back Shoulder, Curl, Go/Fly, Out
2 Slant, Slant, Fade
3 Out, Out, Fade
... and then continuing on for the 192,978 remaining rows in the data.
Upvotes: 0
Views: 70
Reputation: 887098
Useing base R
aggregate(Route ~ ., subset(df, Route != "NULL"), FUN = toString)
Upvotes: 1
Reputation: 388982
Perhaps, you can remove all the 'NULL'
values from the data and for each GameID
and EventID
collapse the Route
values in one string.
library(dplyr)
df %>%
filter(Route != 'NULL') %>%
group_by(GameID, EventID) %>%
summarise(Route = toString(Route)) %>%
ungroup
# GameID EventID Route
# <int> <int> <chr>
#1 2793 15 Fade-BackShoulder, Curl, Go/Fly, Out
Upvotes: 1
Reputation: 181
Assuming I've understood your situation correctly, this could be a solution:
# Some data
data <- dplyr::tibble(
"GameID" = rep(1:5, each = 10),
"EventID" = rep(1:10, each = 5),
"Route" = c(NA, NA, NA, NA, NA,
"A", "B", "C", "D", "E",
NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA,
"A", "B", "T", "G", "E",
"B", "A", "T", "G", "E",
"K", "O", "T", "G", "E",
NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA,
"A", "B", "C", "D", "E"))
First create a unique event ID (so overall unique, not per game) by grouping the data frame by the two ID columns and assigning the unique group_by ID:
data <- data %>%
dplyr::group_by(GameID, EventID) %>%
dplyr::mutate(UniqueEventID = dplyr::cur_group_id())
> # A tibble: 50 x 4
> # Groups: GameID, EventID [10]
> GameID EventID Route UniqueEventID
> <int> <int> <chr> <int>
> 1 1 1 NA 1
> 2 1 1 NA 1
> 3 1 1 NA 1
> 4 1 1 NA 1
> 5 1 1 NA 1
> 6 1 2 A 2
> 7 1 2 B 2
> 8 1 2 C 2
> 9 1 2 D 2
> 10 1 2 E 2
> # … with 40 more rows
Now we create a data frame with the unique combinations:
uniques <- table(data$UniqueEventID, data$Route) %>%
as.data.frame.matrix() %>%
dplyr::distinct()
> A B C D E G K O T
> 1 0 0 0 0 0 0 0 0 0
> 2 1 1 1 1 1 0 0 0 0
> 5 1 1 0 0 1 1 0 0 1
> 7 0 0 0 0 1 1 1 1 1
So each row is a combination, with 1 indicating that route being part of the combination.
To then get a list of unique combinations as strings, we can do something like:
recreate_combination <- function(...){
r <- list(...)
nms <- names(r)
vals <- unname(r)
nms[vals == 1]
}
uniques %>%
purrr::pmap(.f = recreate_combination)
> [[1]]
> character(0)
>
> [[2]]
> [1] "A" "B" "C" "D" "E"
>
> [[3]]
> [1] "A" "B" "E" "G" "T"
>
> [[4]]
> [1] "E" "G" "K" "O" "T"
Of course we could remove the all-zeroes row and avoid the character(0)
combination etc.
Upvotes: 1