Reputation: 25
I have two data frames one has statistical outputs for my data and the genes I am working with are referred to by a cluster Id in this data frame. the other data frame I have has the cluster Id and the accompanying gene_id.
data.frame1 is a collection of disordered clusters with associated statistical data
X baseMean
cluster_1234 542
cluster_2546 764
cluster_3472 564
data.frame2 is arranged by clusters in ascending order, the associated gene_id's however are in a random order, but allow me to compare back to other associated data in another data frame.
gene_id cluster_id
gene_69149 cluster_1
gene_23478 cluster_2
gene_92371 cluster_3
What I would like to do is to add a column with the associated gene-id for each of my clusters by iterating through data.frame1$x. The output would be a new data frame with the genes of interest and the gene-ids. I also should point out that there are only 900 rows in data.frame1 but 53,000 rows in data.frame2.That would something like what is below. The other issue is that the numbers associated with each gene_id are not similar to those associated with each cluster number.
gene_id X baseMean
gene_5463 cluster_1234 542
gene_7934 cluster_2546 764
gene_8346 cluster_3472 564
I just want to add the associated gene_id in a new column next to the cluster id's that are important.
Upvotes: 0
Views: 56
Reputation: 887118
We can use merge
merge(df1, df2, by.x='X', by.y='cluster_id')
If we have large dataset, another option is inner_join/left_join/full_join
etc. (depends on the output wanted) from library(dplyr)
library(dplyr)
inner_join(df1, df2, by=c('X'='cluster_id'))
Upvotes: 1