Reputation: 63
I've a large grouped data I would like to filter. The sample data is shown below.
Data <- data.frame (ID1 = c(1,1,1,2,2,2,3,3,3), Score1 = c(360,360,360,250,250,250,195,195,195), ID2 = c(7,8,9,7,225,98,7,225,174), Score2 = c(330,150,100,330,275,180,330,275,210))
Edit: Pasting an alternate example that has an edge case not in the original:
Data <- data.frame (ID1 = c(1,1,1,2,2,2,3,3,3), Score1 = c(360,360,360,250,250,250,195,195,195), ID2 = c(7,8,9,7,8,98,7,225,174), Score2 = c(330,275,100,330,275,180,330,275,210))
The data is grouped by ID1 and I would like to filter the first row of each group but if ID2 is selected by the previous group, it'll no longer be a candidate for the next group.
The expected outcome for the alternate example is:
Data_Filtered <- data.frame (ID1 = c(1,2,3), Score1 = c(360,250,195), ID2 = c(7,8,225), Score2 = c(330,275,275))
Upvotes: 1
Views: 170
Reputation: 887901
We can group by 'ID1' and slice
the first row
library(dplyr)
Data %>%
distinct(ID2, .keep_all = TRUE) %>%
group_by(ID1) %>%
slice(1)
With the updated dataset, one option is
lst1 <- split(Data, Data$ID1)
out <- lst1[[1]][1,]
for(i in 2:length(lst1)) {
out <- rbind(out, lst1[[i]][!lst1[[i]]$ID2 %in% out$ID2,][1,])
}
out
# ID1 Score1 ID2 Score2
#1 1 360 7 330
#5 2 250 8 275
#8 3 195 225 275
Upvotes: 1