Reputation: 689
So I'm trying to find Greens in Regulation (on green in par minus 2 strokes) using shot data in R that looks like this:
Player Shot Par To_Location Hole
Tiger Woods 1 4 Fairway 1
Tiger Woods 2 4 Green 1
Tiger Woods 3 4 Green 1
Tiger Woods 4 4 Hole 1
Tiger Woods 1 3 Rough 2
Tiger Woods 2 3 Green 2
Tiger Woods 3 3 Hole 2
Tiger Woods 1 4 Green 3
Tiger Woods 2 4 Green 3
Tiger Woods 3 4 Hole 3
I've been using the script below:
result <- df %>%
group_by(Player) %>%
summarize(GIR = sum(To_Location == "Green" & Par - Shot > 1) / n())
But the values aren't correct, most likely because it double counts some of the greens (in the event that there's an eagle opportunity) but also possibly because I shouldn't be summing in this fashion?
I'd want a result that looked like this:
Player GIR
Tiger Woods .6666667
as he made green in regulation on two of the three holes.
Upvotes: 0
Views: 273
Reputation: 32426
Here is a way using top_n
from dplyr
to get the first row. Also, creates a hole variable as mentioned in the comments
g <- rle(df$Par)
df$hole <- rep(seq_along(g$values), times=g$lengths)
result <- df %>%
group_by(Player) %>%
top_n(1, hole) %>%
summarize(GIR = sum(`To Location` == "Green" & Par - Shot > 1) / n())
# Player GIR
# 1 Tiger Woods 0.6666667
Upvotes: 1