Reputation: 903
I have two dataframes one is a list of known genes involved in disease x and the other is the array data. A simply example of the two dataframes are:
knownGene <- as.dataframe(geneID = c(gene1, gene2, gene3, gene5, gene5), chr = c(16,3,4,1,10))
arrayData <- as.dataframe(sampleID = c(xyz1,xyz1,xyz1,xyz2,xyz2,xyz2,xyz2), geneID = c(gene1, gene3, gene4, gene2, gene1, gene3, gene4, gene5)
The array data may have genes listed multiple times (e.g. multiple samples may have variations in the same gene). Therefore doing match
Matched<-arrayData[na.omit(match(knownGene$geneID, arrayData$geneID)),]
will only produce the first match, e.g only max one sample per gene will be pulled out. If I use grep in a loop I get a lot of genes that aren't in my knownGene due to grep pulling out terms containing x characters. My loop look like this
for (i in 1:length(knownGene$geneID)){
x<-arrayData[grep(knownGene[i,2],arrayData$geneID),]
df<-rbind(df,x)
}
Is there any way to either use match like this in a loop (all my attempts have failed thus far). Or be able to grep exact terms in a loop, I'm aware you can grep exact terms if string is provide.
Upvotes: 0
Views: 1316
Reputation: 10167
I'm thinking you want:
arrayData[arrayData$geneID %in% knownGene$geneID,]
If you want to do the grep thing, you could replace this:
grep(knownGene[i,2],arrayData$geneID)
with this:
grep(paste0('^',knownGene[i,2],'$'),arrayData$geneID)
since ^
and $
match the beginning and end of the string, respectively
Upvotes: 1