Reputation: 185
I have the following df (my actual df is bigger but this is better to explain)
df
Date Team Result
1 20-10-2014 Bulls W
2 20-11-2014 Bulls W
3 21-11-2014 Bulls W
4 22-10-2014 Bulls L
5 23-10-2014 Bulls L
6 24-10-2014 Bulls W
7 25-10-2014 Bulls W
8 26-10-2014 Bulls L
I used rle
to calculate winning streaks
df$Result <- as.character(df$Result)
result.rle <- rle(df$Result)
And the maximum amount of streaks
sort(w.rle$lengths, decreasing = TRUE)
What I would like to do now however is to filter the relevant matches with a winning streak above 2.
I checked out ?rle
and couldn't find what i was looking for. Any thoughts on how I can do this?
My end result should be:
#winning streak > 2
Date Team Result
1 20-10-2014 Bulls W
2 20-11-2014 Bulls W
3 21-11-2014 Bulls W
Upvotes: 1
Views: 259
Reputation: 92300
Here's a possible solution using the data.table::rleid
function
library(data.table) # V 1.9.6+
setDT(df)[, indx := .N, by = rleid(Result)][Result == "W" & indx > 2]
# Date Team Result indx
# 1: 20-10-2014 Bulls W 3
# 2: 20-11-2014 Bulls W 3
# 3: 21-11-2014 Bulls W 3
Another option is to hack the by
statement so we could reuse the Result
column in the same call (without creating indx
separately)
df[, .SD[.N > 2 & Result == "W"], by = rleid(df[["Result"]])]
# rleid Date Team Result
# 1: 1 20-10-2014 Bulls W
# 2: 1 20-11-2014 Bulls W
# 3: 1 21-11-2014 Bulls W
Upvotes: 4
Reputation: 3678
A solution using rle
:
df[intersect(which(rep(rle(df$Result)$lengths,rle(df$Result)$lengths)>2),which(rep(rle(df$Result)$values,rle(df$Result)$lengths)=='W')),]
# Date Team Result
# 1 20-10-2014 Bulls W
# 2 20-11-2014 Bulls W
# 3 21-11-2014 Bulls W
DATA :
df <- structure(list(Date = c("20-10-2014", "20-11-2014", "21-11-2014",
"22-10-2014", "23-10-2014", "24-10-2014", "25-10-2014", "26-10-2014"
), Team = c("Bulls", "Bulls", "Bulls", "Bulls", "Bulls", "Bulls",
"Bulls", "Bulls"), Result = c("W", "W", "W", "L", "L", "W", "W",
"L")), .Names = c("Date", "Team", "Result"), class = "data.frame", row.names = c(NA,
-8L))
rle(df$Result)
returns a list of two elements : the number of repetitions and the values of the repetitions. We replicate the lengths (we can use rle(df$Result)$lengths
) to have a vector of the same length as the number of rows of df. We then select the rows with a repetition > 2.
The intersect
takes the intersection of the index of the rows where :
In detail :
which(rep(rle(df$Result)$values,rle(df$Result)$lengths)=='W')
# [1] 1 2 3 6 7 # Those are the rows where it's a Win. Only those rows can be included
which(rep(rle(df$Result)$lengths,rle(df$Result)$lengths)>2)
# [1] 1 2 3 # those are the rows where it's a streak of 3 wins or losses
The intersection will give the expected output.
Upvotes: 1