Reputation: 2593
I have big data set like as flowing and I want to do simple search on this :
>mydata
ID TF
hsa-let-7a-1 SRF
hsa-let-7a-1 PPARG
hsa-let-7a-2 AREB6
hsa-let-7a-3 1-Oct
hsa-let-7a-3 SRF
hsa-let-7a-3 PPARG
hsa-let-7b SRF
. .
. .
. .
Question: For a given y<- c("hsa-let-7a-3","hsa-let-7a-1","hsa-let-7b"...)
, find elements of y
which have same TF in mydata
. some elements of y
might not be in ID
, so the checking should be for those elements of y which are in ID
!.
Finally print them like cluster or group, where in each group/cluster IDs have identical TF
.
the problem is that, in mydata there are identical IDs with different TF
.
can anybody help me to implement this in R.?
the output I expect from about example is(all ID in output are in y
, but not all elements of y
are in ID
!) :
ID TF
>group1 hsa-let-7a-1 SRF
hsa-let-7a-3 SRF
hsa-let-7b SRF
>group2 hsa-let-7a-1 PPARG
hsa-let-7a-3 PPARG
Upvotes: 1
Views: 145
Reputation: 633
You can also use data.table
library
library(data.table)
mydata<-data.table(mydata)
#order by TF
setkey(mydata,TF)
# Subset data where ID is in y, by group, where each group has more than 1 rows:
z<-mydata[ID %in% y,]
s<-z[,.N>1,by = TF]
#The output of s will be like:
TF V1
1: 1-Oct FALSE
2: PPARG TRUE
3: SRF TRUE
# Get output by group
z[TF %in% s[V1==T]$TF]
Upvotes: 1
Reputation: 89057
Try this:
out <- subset(mydata, ID %in% y)
out <- split(out, out$TF)
out <- out[sapply(out, nrow) > 1]
It will return a list of data.frames, one per TF
with two matches or more.
Upvotes: 1