Dirk Buttke
Dirk Buttke

Reputation: 355

How to drop columns from data frame in R based on specified order of row values

I wonder how I can drop columns from a data frame in R based on specified order of row values.

Assume the following data frame:

df <- data.frame(a = c(1,6,2,5,2,0,9,3,21,15,4,0,5,2,1), 
                 b = c(0,0,1,0,0,0,5,0,0,0,0,2,0,0,0), 
                 c = c(1,1,1,1,0,0,0,0,0,10,10,10,10,10,0))
    a b  c
1   1 0  1
2   6 0  1
3   2 1  1
4   5 0  1
5   2 0  0
6   0 0  0
7   9 5  0
8   3 0  0
9  21 0  0
10 15 0 10
11  4 0 10
12  0 2 10
13  5 0 10
14  2 0 10
15  1 0  0

I now want to identify and drop any column with the following sequence of row values: 0, any other value, 0,0,0. Let's say this sequence should occur at least 3 times within a column for this column to be dropped. So using my example I would like to achieve the following:

    a  c
1   1  1
2   6  1
3   2  1
4   5  1
5   2  0
6   0  0
7   9  0
8   3  0
9  21  0
10 15 10
11  4 10
12  0 10
13  5 10
14  2 10
15  1  0

Thanks!

Upvotes: 1

Views: 72

Answers (1)

IceCreamToucan
IceCreamToucan

Reputation: 28675

You can use a "rollapply" type function to check your condition for each window of 5 elements, then take the sum and see if you have e.g. >= 3 matches.

Can change either the window width, 5, (number of elements in your pattern) the number of matches, 3, or the condition checking function my_condition, depending on the particular problem.

library(data.table) # for frollapply. or use library(zoo) and rollapply

my_condition <- function(x) all(x[c(1, 3:5)] == 0)

cond_match <- 
  sapply(df, function(x) sum(frollapply(x, 5, my_condition, fill = 0L)) >= 3)

df[cond_match == FALSE] # or if df is a data.table, df[, cond_match == FALSE, with = FALSE]

#     a  c
# 1   1  1
# 2   6  1
# 3   2  1
# 4   5  1
# 5   2  0
# 6   0  0
# 7   9  0
# 8   3  0
# 9  21  0
# 10 15 10
# 11  4 10
# 12  0 10
# 13  5 10
# 14  2 10
# 15  1  0

Upvotes: 4

Related Questions