How to add specific strings to a data.frame in R

Question

I have two data frames one has statistical outputs for my data and the genes I am working with are referred to by a cluster Id in this data frame. the other data frame I have has the cluster Id and the accompanying gene_id.

data.frame1 is a collection of disordered clusters with associated statistical data

           X    baseMean 
cluster_1234         542
cluster_2546         764
cluster_3472         564

data.frame2 is arranged by clusters in ascending order, the associated gene_id's however are in a random order, but allow me to compare back to other associated data in another data frame.

     gene_id  cluster_id 
  gene_69149   cluster_1
  gene_23478   cluster_2
  gene_92371   cluster_3

What I would like to do is to add a column with the associated gene-id for each of my clusters by iterating through data.frame1$x. The output would be a new data frame with the genes of interest and the gene-ids. I also should point out that there are only 900 rows in data.frame1 but 53,000 rows in data.frame2.That would something like what is below. The other issue is that the numbers associated with each gene_id are not similar to those associated with each cluster number.

  gene_id            X     baseMean
gene_5463 cluster_1234          542
gene_7934 cluster_2546          764
gene_8346 cluster_3472          564

I just want to add the associated gene_id in a new column next to the cluster id's that are important.

akrun · Accepted Answer

We can use merge

merge(df1, df2, by.x='X', by.y='cluster_id')

If we have large dataset, another option is inner_join/left_join/full_join etc. (depends on the output wanted) from library(dplyr)

library(dplyr)
inner_join(df1, df2, by=c('X'='cluster_id'))

How to add specific strings to a data.frame in R

Answers (1)

Related Questions