Reputation: 360
I need help filtering the following dataframe (this is a simple example):
mx = as.data.frame(cbind(c("-", "-", "-", "-", "mutation", "+", "+", "+", "+") ,
c(F, T, F, F, F, F, T, F,T)) )
colnames(mx) = c("mutation", "distance")
mx
mutation distance
1 - FALSE
2 - TRUE
3 - FALSE
4 - FALSE
5 mutation FALSE
6 + FALSE
7 + TRUE
8 + FALSE
9 + TRUE
I need to filter based on the second column (distance), so that it looks like this:
mutation distance
3 - FALSE
4 - FALSE
5 mutation FALSE
6 + FALSE
I need to remove all rows until the last TRUE
that is before the row with the mx$mutation = mutation
value (so rows 1 and 2), and all rows after the first TRUE
that occurs after mx$mutation = mutation
(so row 7 and beyond).
Upvotes: 6
Views: 1850
Reputation: 11955
#sample data (note that I have added few extra rows at the end)
mx = data.frame(mutation = c("-", "-", "-", "-", "mutation", "+", "+", "+", "+", "-", "mutation", "+","+") ,
distance = c(F, T, F, F, F, F, T, F,T,F,F,F,T))
mutation_idx <- which(mx$mutation=="mutation")
distance_T_idx <- which(mx$distance==T)
interval_idx <- findInterval(mutation_idx, distance_T_idx)
rows <- lapply(interval_idx, function(x) ((distance_T_idx[x]+1):(distance_T_idx[x+1]-1)))
mx[unlist(rows),]
Output is:
mutation distance
3 - FALSE
4 - FALSE
5 mutation FALSE
6 + FALSE
10 - FALSE
11 mutation FALSE
12 + FALSE
Upvotes: 0
Reputation: 13334
You can use the which() method to identify rows appropriately:
# get rownum of last TRUE before df$mutation=mutation
last_true_before_mutation <- max(which(mx$distance == 'TRUE')[which(mx$distance == 'TRUE') < which(mx$mutation == 'mutation')])
# get rownum of first TRUE after df$mutation=mutation
first_true_after_mutation <- min(which(mx$distance == 'TRUE')[which(mx$distance == 'TRUE') > which(mx$mutation == 'mutation')])
# all rows to remove
rem_rows <- c(seq(1:last_true_before_mutation), seq(first_true_after_mutation, nrow(mx)))
# remove approproate rows
mx[-rem_rows, ]
Here is a general-purpose function you can use:
before_after_mutation <- function(df) {
last_true_before_mutation <- max(which(df$distance == 'TRUE')[which(df$distance == 'TRUE') < which(df$mutation == 'mutation')])
first_true_after_mutation <- min(which(df$distance == 'TRUE')[which(df$distance == 'TRUE') > which(df$mutation == 'mutation')])
rem_rows <- c(seq(1:last_true_before_mutation), seq(first_true_after_mutation, nrow(df)))
res <- df[-rem_rows,]
return(res)
}
Usage:
before_after_mutation(mx)
Upvotes: 0
Reputation: 887118
We can create a grouping variable by doing the cumulative sum of the logical column ('distance') and then do the filter
library(dplyr)
mx %>%
group_by(grp = cumsum(distance)) %>%
filter(any(mutation == "mutation") & !distance) %>%
ungroup %>%
select(-grp)
# A tibble: 4 x 2
# mutation distance
# <fctr> <lgl>
#1 - F
#2 - F
#3 mutation F
#4 + F
NOTE: We can directly create a data.frame
with data.frame
. No need for cbind
and it would adversely affect the type of the columns as cbind
converts to a matrix
and matrix
can hold only a single type
mx = data.frame(c("-", "-", "-", "-", "mutation", "+", "+", "+", "+") ,
c(F, T, F, F, F, F, T, F,T))
Upvotes: 2