How to remove rows from a data frame when a % of columns have a value less than specified?

Question

I have some data that I want to filter. I want to be able to say, "If a specified percentage of each row contains a value less than indicated, remove that row from the data frame.

Here is some sample data.

       Sample1, Sample2, Sample3, Sample4, Sample5, Sample6
Item1   0   0   0   0   0   0
Item2   478 440 522 578 1066 1045
Item3   16  14  9   6   6   20

Let's say I want rows with 50% of columns with a value of less than 10 to be removed. So in that scenario Item1 row is removed, and Item3 row is removed.

If I change the criteria to be 50% of columns with a value of less than 7, then only Item1 goes, and Item2 and Item3 remain.

What's a neat way to accomplish this in R? This is a simple issue and I want to avoid writing messy code to accomplish it. From what I've read so far I should be doing this with lapply() maybe? I appreciate any insight.

G5W · Accepted Answer

You can do this just by indexing.

## reproduce your data
df = read.table(text="ItemNum Sample1 Sample2 Sample3 Sample4 Sample5 Sample6
Item1   0   0   0   0   0   0
Item2   478 440 522 578 1066 1045
Item3   16  14  9   6   6   20",
header=TRUE, stringsAsFactors=FALSE)

df = df[which(rowSums(df[,2:7] < 10) < 3), ]
df
   ItemNum Sample1 Sample2 Sample3 Sample4 Sample5 Sample6
2   Item2     478     440     522     578    1066    1045

How to remove rows from a data frame when a % of columns have a value less than specified?

Answers (2)

Related Questions