George
George

Reputation: 903

looping match function in R

I have two dataframes one is a list of known genes involved in disease x and the other is the array data. A simply example of the two dataframes are:

knownGene <- as.dataframe(geneID = c(gene1, gene2, gene3, gene5, gene5), chr = c(16,3,4,1,10))

arrayData <- as.dataframe(sampleID = c(xyz1,xyz1,xyz1,xyz2,xyz2,xyz2,xyz2), geneID = c(gene1, gene3, gene4, gene2, gene1, gene3, gene4, gene5)

The array data may have genes listed multiple times (e.g. multiple samples may have variations in the same gene). Therefore doing match

Matched<-arrayData[na.omit(match(knownGene$geneID, arrayData$geneID)),]

will only produce the first match, e.g only max one sample per gene will be pulled out. If I use grep in a loop I get a lot of genes that aren't in my knownGene due to grep pulling out terms containing x characters. My loop look like this

for (i in 1:length(knownGene$geneID)){
  x<-arrayData[grep(knownGene[i,2],arrayData$geneID),]
    df<-rbind(df,x)
}

Is there any way to either use match like this in a loop (all my attempts have failed thus far). Or be able to grep exact terms in a loop, I'm aware you can grep exact terms if string is provide.

Upvotes: 0

Views: 1316

Answers (1)

Jthorpe
Jthorpe

Reputation: 10167

I'm thinking you want:

arrayData[arrayData$geneID %in% knownGene$geneID,]

If you want to do the grep thing, you could replace this:

grep(knownGene[i,2],arrayData$geneID)

with this:

grep(paste0('^',knownGene[i,2],'$'),arrayData$geneID)

since ^ and $ match the beginning and end of the string, respectively

Upvotes: 1

Related Questions