Reputation: 2269
I have a data file like this, where the first two columns are corresponding gene names in human and rat(basically same name with different capitalization)
Human Rat RNAtype Exp Organ
BBS1 Bbs1 reg 7 Lung
ASAP2 Asap2 reg 5 Heart
ASAP2 Asap2 CANONICAL 5 Heart
ASAP2 Asap2 reg 6 Heart
ASAP2 Asap2 reg 8 Lung
ASAP3 Asap3 SCRAMBLED 5 Heart
ASAP3 Asap3 scram 8 Heart
ASAP3 Asap3 CANONICAL 5 Heart
ASAP3 Asap3 reg 5 Heart
Now I want to make subset of it in the following way. If a gene is present in both Lung and heart then I will include this in my subset. for example, here the gene in the first line, BBS1 is present in only Lung but not in Heart, so it won't be in the subset. But ASAP2 is present in both heart and lung. So, it will in the subset. ASAP3 won't be in the subset since it's not present in Lung.
Now the actual data file is pretty big. And there are lots of genes and the order of gene name coming in the first/second column is not ordered like the way here ie it can be ASAP2,ASAP3,ASAP3,BBS1,ASAP2, ASAP2.
Upvotes: 1
Views: 678
Reputation: 92282
A data.table
solution
library(data.table)
setDT(df)[, indx := length(unique(Organ)) >= 2, by = Rat][indx == TRUE]
## Human Rat RNAtype Exp Organ indx
## 1: ASAP2 Asap2 reg 5 Heart TRUE
## 2: ASAP2 Asap2 CANONICAL 5 Heart TRUE
## 3: ASAP2 Asap2 reg 6 Heart TRUE
## 4: ASAP2 Asap2 reg 8 Lung TRUE
With the courtesy of @Arun, here are couple even cooler data.table
solutions
setDT(df)[, .SD[length(unique(Organ)) >= 2L], by=Rat]
Or
setDT(df)[df[, .I[length(unique(Organ))>=2L], by=Rat]$V1]
Upvotes: 2
Reputation: 70256
If your Organ column only contains those two values Lung and Heart, here's another approach:
library(dplyr)
dfsub <- df %>%
group_by(Human) %>%
filter(length(unique(Organ)) >= 2)
#Source: local data frame [4 x 5]
#Groups: Human
#
# Human Rat RNAtype Exp Organ
#1 ASAP2 Asap2 reg 5 Heart
#2 ASAP2 Asap2 CANONICAL 5 Heart
#3 ASAP2 Asap2 reg 6 Heart
#4 ASAP2 Asap2 reg 8 Lung
Upvotes: 2
Reputation: 44525
Here's another strategy:
> u <- with(df, by(Human, Organ, unique))
> df[df$Human %in% intersect(u[[1]], u[[2]]), ]
Human Rat RNAtype Exp Organ
2 ASAP2 Asap2 reg 5 Heart
3 ASAP2 Asap2 CANONICAL 5 Heart
4 ASAP2 Asap2 reg 6 Heart
5 ASAP2 Asap2 reg 8 Lung
Upvotes: 3
Reputation: 17611
Try this:
# read in sample data
df<-read.table(text="
Human Rat RNAtype Exp Organ
BBS1 Bbs1 reg 7 Lung
ASAP2 Asap2 reg 5 Heart
ASAP2 Asap2 CANONICAL 5 Heart
ASAP2 Asap2 reg 6 Heart
ASAP2 Asap2 reg 8 Lung
ASAP3 Asap3 SCRAMBLED 5 Heart
ASAP3 Asap3 scram 8 Heart
ASAP3 Asap3 CANONICAL 5 Heart
ASAP3 Asap3 reg 5 Heart", header=TRUE)
merge(df[df$Organ=="Heart",], df[df$Organ=="Lung",], by=c("Human","Rat"))
Human Rat RNAtype.x Exp.x Organ.x RNAtype.y Exp.y Organ.y
1 ASAP2 Asap2 reg 5 Heart reg 8 Lung
2 ASAP2 Asap2 CANONICAL 5 Heart reg 8 Lung
3 ASAP2 Asap2 reg 6 Heart reg 8 Lung
Alternatively:
genes2keep <- intersect(df[df$Organ=="Heart",]$Human, df[df$Organ=="Lung",]$Human)
df[df$Human %in% genes2keep, ]
Human Rat RNAtype Exp Organ
2 ASAP2 Asap2 reg 5 Heart
3 ASAP2 Asap2 CANONICAL 5 Heart
4 ASAP2 Asap2 reg 6 Heart
5 ASAP2 Asap2 reg 8 Lung
Upvotes: 2