StatMan22
StatMan22

Reputation: 13

Assign numbers to elements of data.frame based on order of appearance within a group

I have a data-set with ball-by-ball cricket data, which is ordered by the sequence of events in the match (i.e., "over" in the data frame). I would like to add a column assigning a batting position (1,2,3,...) to each batsmen within batting_team and match_id. There are two batsmen at a time-so they can bat, somebody else bats, and they come back.

I've tried things like tally() but that doesn't do exactly what I want, and I suspect there may be a potential solution using factors, but I don't know how to do that by group.

Here is a sample data frame:

mydata <- data.frame(batting_team=c(rep("South Africa",6),rep("England",6)),
                     match_id=c(rep(343434,6),rep(353535,6)),
                     over=rep(seq(0.1,0.6,0.1),2),
                     batsman=c("HM Amla","HM Amla","GC Smith","HM Amla","JH Kallis","JH Kallis",
                               "JJ Roy","JJ Roy","JJ Roy","JM Bairstow","JM Bairstow","JJ Roy"))

Here is my desired output:

   batting_team match_id over     batsman batting_order
1  South Africa   343434  0.1     HM Amla             1
2  South Africa   343434  0.2     HM Amla             1
3  South Africa   343434  0.3    GC Smith             2
4  South Africa   343434  0.4     HM Amla             1
5  South Africa   343434  0.5   JH Kallis             3
6  South Africa   343434  0.6   JH Kallis             3
7       England   353535  0.1      JJ Roy             1
8       England   353535  0.2      JJ Roy             1
9       England   353535  0.3      JJ Roy             1
10      England   353535  0.4 JM Bairstow             2
11      England   353535  0.5 JM Bairstow             2
12      England   353535  0.6      JJ Roy             1

Upvotes: 1

Views: 39

Answers (1)

akrun
akrun

Reputation: 887951

An option would be to group by 'batting_team' and do a match with the unique 'batsman' with the 'batsman' to get the index

library(dplyr)
mydata %>%
   group_by(batting_team) %>% 
   mutate(batting_order = match(batsman, unique(batsman)))
# A tibble: 12 x 5
# Groups:   batting_team [2]
#   batting_team match_id  over batsman     batting_order
#   <fct>           <dbl> <dbl> <fct>               <int>
# 1 South Africa   343434   0.1 HM Amla                 1
# 2 South Africa   343434   0.2 HM Amla                 1
# 3 South Africa   343434   0.3 GC Smith                2
# 4 South Africa   343434   0.4 HM Amla                 1
# 5 South Africa   343434   0.5 JH Kallis               3
# 6 South Africa   343434   0.6 JH Kallis               3
# 7 England        353535   0.1 JJ Roy                  1
# 8 England        353535   0.2 JJ Roy                  1
# 9 England        353535   0.3 JJ Roy                  1
#10 England        353535   0.4 JM Bairstow             2
#11 England        353535   0.5 JM Bairstow             2
#12 England        353535   0.6 JJ Roy                  1

Upvotes: 2

Related Questions