Reputation: 850
I have some data that I want to filter. I want to be able to say, "If a specified percentage of each row contains a value less than indicated, remove that row from the data frame.
Here is some sample data.
Sample1, Sample2, Sample3, Sample4, Sample5, Sample6
Item1 0 0 0 0 0 0
Item2 478 440 522 578 1066 1045
Item3 16 14 9 6 6 20
Let's say I want rows with 50% of columns with a value of less than 10 to be removed. So in that scenario Item1 row is removed, and Item3 row is removed.
If I change the criteria to be 50% of columns with a value of less than 7, then only Item1 goes, and Item2 and Item3 remain.
What's a neat way to accomplish this in R? This is a simple issue and I want to avoid writing messy code to accomplish it. From what I've read so far I should be doing this with lapply() maybe? I appreciate any insight.
Upvotes: 0
Views: 650
Reputation: 3230
library(data.table)
dat <- fread("Item Sample1 Sample2 Sample3 Sample4 Sample5 Sample6
Item1 0 0 0 0 0 0
Item2 478 440 522 578 1066 1045
Item3 16 14 9 6 6 20")
slice_val <- 10
dat[apply(dat[, !"Item"], 1, function(x) sum(x > slice_val)/length(x)) > 0.5]
Item Sample1 Sample2 Sample3 Sample4 Sample5 Sample6
1: Item2 478 440 522 578 1066 1045
slice_val <- 7
dat[apply(dat[, !"Item"], 1, function(x) sum(x > slice_val)/length(x)) > 0.5]
Item Sample1 Sample2 Sample3 Sample4 Sample5 Sample6
1: Item2 478 440 522 578 1066 1045
2: Item3 16 14 9 6 6 20
Upvotes: 1
Reputation: 37641
You can do this just by indexing.
## reproduce your data
df = read.table(text="ItemNum Sample1 Sample2 Sample3 Sample4 Sample5 Sample6
Item1 0 0 0 0 0 0
Item2 478 440 522 578 1066 1045
Item3 16 14 9 6 6 20",
header=TRUE, stringsAsFactors=FALSE)
df = df[which(rowSums(df[,2:7] < 10) < 3), ]
df
ItemNum Sample1 Sample2 Sample3 Sample4 Sample5 Sample6
2 Item2 478 440 522 578 1066 1045
Upvotes: 1