Count Unique Combinations Meeting Specific Criteria

Question

The Problem:

I would like to count the number of unique 5-player combinations n, which meet the criteria described below, for each team using the following data.

The Data:

TEAM <- c("A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B")
PLAYER <- c("Will","Will","Roy","Roy","Jaylon","Dean","Yosef","Devan","Quincy","Quincy","Luis","Xzavier","Seth","Layne","Layne","Antwan")
LP <- c(1,1,2,2,3,4,5,6,1,1,2,3,4,5,5,6)
POS <- c("3B","OF","1B","OF","SS","OF","C","OF","2B","OF","OF","C","3B","1B","OF","SS")
df <- data.frame(TEAM,PLAYER,LP,POS)

df:

    TEAM  PLAYER   LP  POS
 1  A     Will     1   3B
 2  A     Will     1   OF
 3  A     Roy      2   1B
 4  A     Roy      3   OF
 5  A     Jaylon   3   SS
 6  A     Dean     4   OF
 7  A     Yosef    5   C
 8  A     Devan    6   OF
 9  B     Quincy   1   2B
10  B     Quincy   1   OF
11  B     Luis     2   OF
12  B     Xzavier  3   C
13  B     Seth     4   3B
14  B     Layne    5   1B
15  B     Layne    5   OF
16  B     Antwan   6   SS

Edit: The LP column is irrelevant to the output. That wasn't as clear as I would have liked it to be in the original post.

The Criteria:

Five unique players PLAYER must be used (one will always be left out, as there are six players in the pool available for each team).
Each position POS may only be used once with the exception of OF, which may be used up to three times OF <= 3.
Combinations may not use players PLAYER from multiple teams TEAM.

For Example:

These are just a few of the many possible combinations I am looking to create/count:

   TEAM  1          2          3          4         5
1  A     Will-OF    Roy-1B     Jaylon-SS  Dean-OF   Devan-OF
2  A     Roy-OF     Jaylon-SS  Dean-OF    Yosef-C   Devan-OF
3  A     Will-3B    Roy-OF     Jaylon-SS  Dean-OF   Yosef-C
...
n  A     Will-3B    Roy-1B     Jaylon-SS  Dean-OF   Yosef-C       

   TEAM  1          2          3          4         5
1  B     Quincy-2B  Luis-OF    Xzavier-C  Seth-3B   Layne-1B
2  B     Quincy-2B  Luis-OF    Seth-3B    Layne-1B  Antwan-SS
3  B     Quincy-OF  Luis-OF    Xzavier-C  Seth-3B   Layne-OF
...
n  B     Quincy-2B  Luis-OF    Xzavier-C  Seth-3B   Layne-OF

Desired Result:

TEAM  UNIQUE
A     n
B     n

What I've Tried:

I know how to get all possible 5-player combinations for each team and summarise that. I'm just not sure how to get the combinations I'm looking for using the specific criteria as defined for their positions.

I wish I knew where to begin with this one. I could really use your help. Thank you!

Parfait · Accepted Answer

Consider several wrangling steps:

Assign new column as concatenation of PLAYER and POS.
Run by to split data frame by teams and run operations on splits (Rule #3).
Run combn on PLAYER_POS to choose 5 listings.
Run ave for running count of similar PLAYER.
Run Filter to keep data frames of 5 rows, 5 unique players, and adheres to positions criteria (Rule #1 and #2).

Base R code

# HELPER COLUMN
df$PLAYER_POS <- with(df, paste(PLAYER, POS, sep="_"))

# BUILD LIST OF DFs BY TEAM
df_list <- by(df, df$TEAM, function(sub){
  combn(sub$PLAYER_POS, 5, FUN = function(p) 
    transform(subset(sub, PLAYER_POS %in% p),
              PLAYER_NUM = ave(LP, PLAYER, FUN=seq_along)), 
    simplify = FALSE)
})
  
# FILTER LIST OF DFs BY TEAM
df_list <- lapply(df_list, function(dfs) 
  Filter(function(df) 
           nrow(df) == 5 & 
           max(df$PLAYER_NUM)==1 &
           length(df$POS[df$POS == "OF"]) <= 3 &
           length(df$POS[df$POS != "OF"]) == length(unique(df$POS[df$POS != "OF"])), 
         dfs)
)

# COUNT REMAINING DFs BY TEAM FOR UNIQUE n
lengths(df_list)
#  A  B 
# 18 20 

data.frame(TEAMS=names(df_list), UNIQUE=lengths(df_list), row.names=NULL)
#   TEAMS UNIQUE
# 1     A     18
# 2     B     20

Output (list of subsetted data frames)

df_list$A[[1]]
#   TEAM PLAYER LP POS PLAYER_POS PLAYER_NUM
# 1    A   Will  1  3B    Will_3B          1
# 3    A    Roy  2  1B     Roy_1B          1
# 5    A Jaylon  3  SS  Jaylon_SS          1
# 6    A   Dean  4  OF    Dean_OF          1
# 7    A  Yosef  5   C    Yosef_C          1
df_list$A[[2]]
df_list$A[[3]]
...
df_list$A[[18]]


df_list$B[[1]]
#    TEAM  PLAYER LP POS PLAYER_POS PLAYER_NUM
# 9     B  Quincy  1  2B  Quincy_2B          1
# 11    B    Luis  2  OF    Luis_OF          1
# 12    B Xzavier  3   C  Xzavier_C          1
# 13    B    Seth  4  3B    Seth_3B          1
# 14    B   Layne  5  1B   Layne_1B          1
df_list$B[[2]]
df_list$B[[3]]
...
df_list$B[[20]]

Count Unique Combinations Meeting Specific Criteria

Answers (2)

A messier solution,

Related Questions