Jose Gracia Rodriguez
Jose Gracia Rodriguez

Reputation: 167

Extracting two columns from a data frame and merging them with a second data frame

I'm having some trouble working with data frames.

I have a data frame (df1, see image 1), from which I need to extract the IDs from the last column (comments), this I can do with the next line:

genes <- sapply(df1$Comment, function(x){
  paste(substring(x, as.numeric(gregexpr("SL1344_RS", x)[[1]]), 
                  as.numeric(gregexpr("SL1344_RS", x)[[1]]) + 13), collapse=";")
  })

now I need to merge this ID with their correspondent frequency value, under the column "freq", so I end up with two columns, one with the ID and the next with their frequency value.

Finally, I need to change the ID with their protein name, this information It´s within df2 (see image), where you can see each ID is related to a gene product under the column of "Product".

Here is the code that generates df1 and df2.

df1 <- data.frame(
  gene = c("SL1344_RS000040", "SL1344_RS000095", "SL1344_RS000105"),
  freq = c(7, 21 , 51))

df2 <- data.frame(
  code = c("SL1344_RS000095", "SL1344_RS000040", "SL1344_RS000105"),
  product = c("Product C", "Product A", "Product B"))

Upvotes: 0

Views: 97

Answers (1)

MacOS
MacOS

Reputation: 1159

now I need to merge this ID with their correspondent frequency value, under the column "freq", so I end up with two columns, one with the ID and the next with their frequency value.

You can achieve that by simply doing

genes.freq <- cbind(genes, df1$freq)

Finally, I need to change the ID with their protein name, this information It´s within df2 (see image), where you can see each ID is related to a gene product under the column of "Product".

This is simply a search in df2 where you want to get the index of the match.

names <- df2$Product[ match(genes.freq$gene, df2$Code) ]

genes.freq.names <- cbind(
    genes.freq, names)

Here is a simple toy example

df1 <- data.frame(
  gene = c("SL1344_RS000040", "SL1344_RS000095", "SL1344_RS000105"),
  freq = c(7, 21 , 51))

df2 <- data.frame(
  code = c("SL1344_RS000095", "SL1344_RS000040", "SL1344_RS000105"),
  product = c("Product C", "Product A", "Product B"))

df1$Product.Names <- df2$product[ match(df1$gene, df2$code) ]

df1

I hope this helps!

Upvotes: 1

Related Questions