skathan
skathan

Reputation: 689

Counting only first appearance of item in R?

So I'm trying to find Greens in Regulation (on green in par minus 2 strokes) using shot data in R that looks like this:

Player     Shot Par To_Location   Hole
Tiger Woods 1   4   Fairway        1
Tiger Woods 2   4   Green          1 
Tiger Woods 3   4   Green          1
Tiger Woods 4   4   Hole           1
Tiger Woods 1   3   Rough          2
Tiger Woods 2   3   Green          2
Tiger Woods 3   3   Hole           2
Tiger Woods 1   4   Green          3
Tiger Woods 2   4   Green          3
Tiger Woods 3   4   Hole           3

I've been using the script below:

result <- df %>% 
group_by(Player) %>%
summarize(GIR = sum(To_Location == "Green" & Par - Shot > 1) / n())

But the values aren't correct, most likely because it double counts some of the greens (in the event that there's an eagle opportunity) but also possibly because I shouldn't be summing in this fashion?

I'd want a result that looked like this:

Player        GIR
Tiger Woods   .6666667

as he made green in regulation on two of the three holes.

Upvotes: 0

Views: 273

Answers (1)

Rorschach
Rorschach

Reputation: 32426

Here is a way using top_n from dplyr to get the first row. Also, creates a hole variable as mentioned in the comments

g <- rle(df$Par)
df$hole <- rep(seq_along(g$values), times=g$lengths)

result <- df %>% 
  group_by(Player) %>%
  top_n(1, hole) %>%
  summarize(GIR = sum(`To Location` == "Green" & Par - Shot > 1) / n())
#        Player       GIR
# 1 Tiger Woods 0.6666667

Upvotes: 1

Related Questions