Reputation: 481
I want to extract the rows if all the columns are zero. In the first column of df
, there is a list of gene IDs that are repeated per individuals in the second column. I want to extract gene IDs if the df[,3:length(df)]
are all zero for all individuals in the next column.
> dim(df)
[1] 1040675 56
> df
ID INDV tra1 tr2 tr3 tra2 tr15 tr1b
ENS777 1 1.2 0 0 1.6 3.3 0
ENS777 2 1.2 0 0 1.6 3.3 0
ENS777 3 1.2 0 0 1.6 3.3 0
ENS777 4 1.2 0 0 1.6 3.3 0
ENS999 1 0 0 0 0 0 0
ENS999 2 0 0 0 0 0 0
ENS999 3 0 0 0 0 0 0
ENS999 4 0 0 0 0 0 0
ENS888 1 1.2 0 0 1.6 3.3 0
ENS888 2 1.2 0 0 1.6 3.3 0
ENS888 3 1.2 0 0 1.6 3.3 0
ENS888 4 1.2 0 0 1.6 3.3 0
So, the out put would be ENS999
in this case.
Upvotes: 0
Views: 271
Reputation: 389325
Using dplyr
-
library(dplyr)
df %>%
group_by(ID) %>%
filter(all(unlist(dplyr::select(cur_data(), tra1:tr1b) == 0))) %>%
ungroup %>%
distinct(ID)
# ID
# <chr>
#1 ENS999
Upvotes: 0
Reputation: 311
If I understand correctly, if all the columns are zero for all individuals.
You can try this:
library(data.table)
df <- fread("Book2.csv")
> df
ID INDV tra1 tr2 tr3 tra2 tr15 tr1b colsum
1: ENS777 1 1.2 0 0 1.6 3.3 0 6.1
2: ENS777 2 1.2 0 0 1.6 3.3 0 6.1
3: ENS777 3 1.2 0 0 1.6 3.3 0 6.1
4: ENS777 4 1.2 0 0 1.6 3.3 0 6.1
5: ENS999 1 0.0 0 0 0.0 0.0 0 0.0
6: ENS999 2 0.0 0 0 0.0 0.0 0 0.0
7: ENS999 3 0.0 0 0 0.0 0.0 0 0.0
8: ENS999 4 0.0 0 0 0.0 0.0 0 0.0
9: ENS888 1 1.2 0 0 1.6 3.3 0 6.1
10: ENS888 2 1.2 0 0 1.6 3.3 0 6.1
11: ENS888 3 1.2 0 0 1.6 3.3 0 6.1
12: ENS888 4 1.2 0 0 1.6 3.3 0 6.1
#take the column sums
df[, colsum := tra1+tr2+tr3+tra2+tr15+tr1b]
#take the row sums by ID
df[, rowsum := sum(colsum), by= ID]
> df
ID INDV tra1 tr2 tr3 tra2 tr15 tr1b colsum rowsum
1: ENS777 1 1.2 0 0 1.6 3.3 0 6.1 24.4
2: ENS777 2 1.2 0 0 1.6 3.3 0 6.1 24.4
3: ENS777 3 1.2 0 0 1.6 3.3 0 6.1 24.4
4: ENS777 4 1.2 0 0 1.6 3.3 0 6.1 24.4
5: ENS999 1 0.0 0 0 0.0 0.0 0 0.0 0.0
6: ENS999 2 0.0 0 0 0.0 0.0 0 0.0 0.0
7: ENS999 3 0.0 0 0 0.0 0.0 0 0.0 0.0
8: ENS999 4 0.0 0 0 0.0 0.0 0 0.0 0.0
9: ENS888 1 1.2 0 0 1.6 3.3 0 6.1 24.4
10: ENS888 2 1.2 0 0 1.6 3.3 0 6.1 24.4
11: ENS888 3 1.2 0 0 1.6 3.3 0 6.1 24.4
12: ENS888 4 1.2 0 0 1.6 3.3 0 6.1 24.4
# Now you need the unique IDs for which the row sums are zero
> unique(df[rowsum==0]$ID)
[1] "ENS999"
Upvotes: 1