Reputation: 55
I have the following data frame (my real data frame has more rows and columns but is ostensibly structured as such):
Root_R1 = c(1,2,3,4,5)
Root_R2 = c(1,0,3,0,0)
Root_R3 = c(1,0,3,0,0)
Shoot_R1 = c(1,0,3,4,5)
Shoot_R2 = c(0,0,31,4,5)
Shoot_R3 = c(0,0,0,0,0)
data.frame(Root_R1, Root_R2, Root_R3, Shoot_R1, Shoot_R2, Shoot_R3)
Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
1 Root Root Root Shoot Shoot Shoot
2 1 1 1 1 0 0
3 2 0 0 0 0 0
4 3 3 3 3 31 0
5 4 0 0 4 4 0
6 5 0 0 5 5 0
What I would like to do is filter this data frame and find all of the rows where at least two of the columns associated with a tissue type (Root, Shoot, etc.) have values greater than 0. So, for the columns associated with "Roots" (Column 1,2,3), Rows 1 - 3 should be returned while the ones associated with "Shoots"will have Rows 4 - 6 returned. I thought a ifelse
code would work but that seems inefficient. Would filter
from dplyr
be more appropriate?
Upvotes: 1
Views: 624
Reputation: 76641
Here is a base R solution.
grep
tells the "Root"
columns from the "Shoot"
ones. Then apply
loops returning logical (row) indices and which
take care of sub-setting the data.frame.
Root_R1 = c("Root",1,2,3,4,5)
Root_R2 = c("Root",1,0,3,0,0)
Root_R3 = c("Root",1,0,3,0,0)
Shoot_R1 = c("Shoot",1,0,3,4,5)
Shoot_R2 = c("Shoot",0,0,31,4,5)
Shoot_R3 = c("Shoot",0,0,0,0,0)
df1 <- data.frame(Root_R1, Root_R2, Root_R3, Shoot_R1, Shoot_R2, Shoot_R3)
df1 <- df1[-1,]
df1[] <- lapply(df1, as.integer)
root <- grep("Root", names(df1))
shoot <- grep("Shoot", names(df1))
ok_root <- which(apply(df1[root], 1, \(x) sum(x > 0L) >= 2L))
ok_shoot <- which(apply(df1[shoot], 1, \(x) sum(x > 0L) >= 2L))
df1[ok_root, ]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2 1 1 1 1 0 0
#> 4 3 3 3 3 31 0
df1[ok_shoot, ]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4 3 3 3 3 31 0
#> 5 4 0 0 4 4 0
#> 6 5 0 0 5 5 0
Created on 2022-06-09 by the reprex package (v2.0.1)
Following a question in comments
And assuming I want to change the numerical value of the cut-off, I would change this part of the code (
sum(x > 0L)
) and if I wanted to change the number of rows that meet the cut-off, I would change this:>= 2L
?
here is a function to solve the problem.
special_subset <- function(x, colpattern, cutoff = 0L, numrows = 2L) {
i_cols <- grep(colpattern, names(x))
ok <- which(apply(x[i_cols], 1, \(y) sum(y > cutoff) >= numrows))
x[ok, ]
}
special_subset(df1, "Root")
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2 1 1 1 1 0 0
#> 4 3 3 3 3 31 0
special_subset(df1, "Shoot", cutoff = 1)
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4 3 3 3 3 31 0
#> 5 4 0 0 4 4 0
#> 6 5 0 0 5 5 0
Created on 2022-06-09 by the reprex package (v2.0.1)
To pass more than one colpattern
to the function above, use a lapply
loop.
In the two exaples below, first I use the new pipe operator introduced in R 4.2.0 and in the second a standard lapply
.
tissue_type <- c("Root", "Shoot")
tissue_type |>
lapply(\(pat, data) special_subset(data, pat), data = df1)
#> [[1]]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2 1 1 1 1 0 0
#> 4 3 3 3 3 31 0
#>
#> [[2]]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4 3 3 3 3 31 0
#> 5 4 0 0 4 4 0
#> 6 5 0 0 5 5 0
lapply(tissue_type, \(pat, data) special_subset(data, pat), data = df1)
#> [[1]]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 2 1 1 1 1 0 0
#> 4 3 3 3 3 31 0
#>
#> [[2]]
#> Root_R1 Root_R2 Root_R3 Shoot_R1 Shoot_R2 Shoot_R3
#> 4 3 3 3 3 31 0
#> 5 4 0 0 4 4 0
#> 6 5 0 0 5 5 0
Created on 2022-06-17 by the reprex package (v2.0.1)
Upvotes: 1