bcongelio
bcongelio

Reputation: 101

Finding All Combinations of a Variable Grouped by Another

I am working with data that looks like this:

# A tibble: 192,988 x 3
   GameID EventID Route               
    <int>   <int> <chr>               
 1   2793      10 NULL                
 2   2793      10 NULL                
 3   2793      10 NULL                
 4   2793      10 NULL                
 5   2793      10 NULL                
 6   2793      10 NULL                
 7   2793      15 Fade - Back Shoulder
 8   2793      15 Curl                
 9   2793      15 Go/Fly              
10   2793      15 Out                 
# … with 192,978 more rows

The GameID correlates to a specific NFL game, and the EventID is a single play within that NFL game. The Route variable is the routes that each WR ran on that specific EventID.

For example: EventID 10 was likely a running play, as no WR routes were charted.

However, in EventID 15, there were four wide receivers on the field. And, as listed, they ran a 'Fade - Back Shoulder', a 'Curl', a 'Go/Fly', and an 'Out.'

What I am needing help with: I am trying to figure out how to find all the unique combinations of the routes ran. So the four listed above in the example data would be a unique combination and then so on and so forth.

So far I have had no luck figuring this out on my own, so I am hoping to find answer here on how to proceed with this.

EDIT

I would like the final outcome to look like this:

Combo.Number     Routes
1                Fade - Back Shoulder, Curl, Go/Fly, Out
2                Slant, Slant, Fade
3                Out, Out, Fade

... and then continuing on for the 192,978 remaining rows in the data.

Upvotes: 0

Views: 70

Answers (3)

akrun
akrun

Reputation: 887098

Useing base R

aggregate(Route ~ ., subset(df, Route != "NULL"), FUN = toString)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

Perhaps, you can remove all the 'NULL' values from the data and for each GameID and EventID collapse the Route values in one string.

library(dplyr)

df %>%
  filter(Route != 'NULL') %>%
  group_by(GameID, EventID) %>%
  summarise(Route = toString(Route)) %>%
  ungroup

#  GameID EventID Route                               
#   <int>   <int> <chr>                               
#1   2793      15 Fade-BackShoulder, Curl, Go/Fly, Out

Upvotes: 1

ludvigolsen
ludvigolsen

Reputation: 181

Assuming I've understood your situation correctly, this could be a solution:

# Some data
data <- dplyr::tibble(
    "GameID" = rep(1:5, each = 10), 
    "EventID" = rep(1:10, each = 5),
    "Route" = c(NA, NA, NA, NA, NA, 
                "A", "B", "C", "D", "E",
                NA, NA, NA, NA, NA, 
                NA, NA, NA, NA, NA, 
                "A", "B", "T", "G", "E",
                "B", "A", "T", "G", "E",
                "K", "O", "T", "G", "E",
                NA, NA, NA, NA, NA, 
                NA, NA, NA, NA, NA, 
                "A", "B", "C", "D", "E"))

First create a unique event ID (so overall unique, not per game) by grouping the data frame by the two ID columns and assigning the unique group_by ID:

data <- data %>% 
  dplyr::group_by(GameID, EventID) %>% 
  dplyr::mutate(UniqueEventID = dplyr::cur_group_id())

> # A tibble: 50 x 4
> # Groups:   GameID, EventID [10]
>    GameID EventID Route UniqueEventID
>     <int>   <int> <chr>         <int>
>  1      1       1 NA                1
>  2      1       1 NA                1
>  3      1       1 NA                1
>  4      1       1 NA                1
>  5      1       1 NA                1
>  6      1       2 A                 2
>  7      1       2 B                 2
>  8      1       2 C                 2
>  9      1       2 D                 2
> 10      1       2 E                 2
> # … with 40 more rows

Now we create a data frame with the unique combinations:

uniques <- table(data$UniqueEventID, data$Route) %>% 
  as.data.frame.matrix() %>% 
  dplyr::distinct()

>   A B C D E G K O T
> 1 0 0 0 0 0 0 0 0 0
> 2 1 1 1 1 1 0 0 0 0
> 5 1 1 0 0 1 1 0 0 1
> 7 0 0 0 0 1 1 1 1 1

So each row is a combination, with 1 indicating that route being part of the combination.

To then get a list of unique combinations as strings, we can do something like:

recreate_combination <- function(...){
  r <- list(...)
  nms <- names(r)
  vals <- unname(r)
  nms[vals == 1]
}

uniques %>% 
  purrr::pmap(.f = recreate_combination)

> [[1]]
> character(0)
> 
> [[2]]
> [1] "A" "B" "C" "D" "E"
> 
> [[3]]
> [1] "A" "B" "E" "G" "T"
> 
> [[4]]
> [1] "E" "G" "K" "O" "T"

Of course we could remove the all-zeroes row and avoid the character(0) combination etc.

Upvotes: 1

Related Questions