user2806363
user2806363

Reputation: 2593

implementation of simple database search in R

I have big data set like as flowing and I want to do simple search on this :

>mydata

    ID              TF

    hsa-let-7a-1    SRF
    hsa-let-7a-1    PPARG
    hsa-let-7a-2    AREB6
    hsa-let-7a-3    1-Oct
    hsa-let-7a-3    SRF
    hsa-let-7a-3    PPARG
    hsa-let-7b      SRF
    .               .
    .               .
    .               .

Question: For a given y<- c("hsa-let-7a-3","hsa-let-7a-1","hsa-let-7b"...), find elements of y which have same TF in mydata. some elements of y might not be in ID, so the checking should be for those elements of y which are in ID!.

Finally print them like cluster or group, where in each group/cluster IDs have identical TF. the problem is that, in mydata there are identical IDs with different TF. can anybody help me to implement this in R.? the output I expect from about example is(all ID in output are in y, but not all elements of y are in ID !) :

            ID                TF

>group1       hsa-let-7a-1      SRF
              hsa-let-7a-3      SRF
              hsa-let-7b        SRF

>group2       hsa-let-7a-1      PPARG
              hsa-let-7a-3      PPARG

Upvotes: 1

Views: 145

Answers (2)

Asayat
Asayat

Reputation: 633

You can also use data.table library

library(data.table)
mydata<-data.table(mydata)
#order by TF
setkey(mydata,TF)
# Subset data where ID is in y, by group, where each group has more than 1 rows:
z<-mydata[ID %in% y,]
s<-z[,.N>1,by = TF]
#The output of s will be like:
      TF    V1
1: 1-Oct FALSE
2: PPARG  TRUE
3:   SRF  TRUE
# Get output by group
z[TF %in% s[V1==T]$TF]

Upvotes: 1

flodel
flodel

Reputation: 89057

Try this:

out <- subset(mydata, ID %in% y)
out <- split(out, out$TF)
out <- out[sapply(out, nrow) > 1]

It will return a list of data.frames, one per TF with two matches or more.

Upvotes: 1

Related Questions