Reputation: 355
I wonder how I can drop columns from a data frame in R based on specified order of row values.
Assume the following data frame:
df <- data.frame(a = c(1,6,2,5,2,0,9,3,21,15,4,0,5,2,1),
b = c(0,0,1,0,0,0,5,0,0,0,0,2,0,0,0),
c = c(1,1,1,1,0,0,0,0,0,10,10,10,10,10,0))
a b c
1 1 0 1
2 6 0 1
3 2 1 1
4 5 0 1
5 2 0 0
6 0 0 0
7 9 5 0
8 3 0 0
9 21 0 0
10 15 0 10
11 4 0 10
12 0 2 10
13 5 0 10
14 2 0 10
15 1 0 0
I now want to identify and drop any column with the following sequence of row values: 0, any other value, 0,0,0. Let's say this sequence should occur at least 3 times within a column for this column to be dropped. So using my example I would like to achieve the following:
a c
1 1 1
2 6 1
3 2 1
4 5 1
5 2 0
6 0 0
7 9 0
8 3 0
9 21 0
10 15 10
11 4 10
12 0 10
13 5 10
14 2 10
15 1 0
Thanks!
Upvotes: 1
Views: 72
Reputation: 28675
You can use a "rollapply" type function to check your condition for each window of 5 elements, then take the sum and see if you have e.g. >= 3 matches.
Can change either the window width, 5, (number of elements in your pattern) the number of matches, 3, or the condition checking function my_condition
, depending on the particular problem.
library(data.table) # for frollapply. or use library(zoo) and rollapply
my_condition <- function(x) all(x[c(1, 3:5)] == 0)
cond_match <-
sapply(df, function(x) sum(frollapply(x, 5, my_condition, fill = 0L)) >= 3)
df[cond_match == FALSE] # or if df is a data.table, df[, cond_match == FALSE, with = FALSE]
# a c
# 1 1 1
# 2 6 1
# 3 2 1
# 4 5 1
# 5 2 0
# 6 0 0
# 7 9 0
# 8 3 0
# 9 21 0
# 10 15 10
# 11 4 10
# 12 0 10
# 13 5 10
# 14 2 10
# 15 1 0
Upvotes: 4