user2165857
user2165857

Reputation: 2690

Add column to table with data from another table

I have a table that looks as follows:

Table1 <- data.frame(
    "Random" = c("A", "B", "C"), 
    "Genes" = c("Apple", "Candy", "Toothpaste"), 
    "Extra" = c("Up", "", "Down"), 
    "Desc" = c("Healthy,Red,Fruit", "Sweet,Cavities,Sugar,Fruity", "Minty,Dentist")
)

giving:

  Random      Genes Extra                       Desc
1      A      Apple    Up          Healthy,Red,Fruit
2      B      Candy       Sweet,Cavities,Sugar,Fruity
3      C Toothpaste  Down              Minty,Dentist

I have another table with Descriptions and want to add a columns with Genes. For instance Table2 would be:

Table2 <- data.frame(
    "Col1" = c(1, 2, 3, 4, 5, 6), 
    "Desc" = c("Sweet", "Sugar", "Dentist", "Red", "Fruit", "Fruity")
)

Giving:

  Col1    Desc
1    1   Sweet
2    2   Sugar
3    3 Dentist
4    4     Red
5    5   Fruit
6    6  Fruity

I want to add another column to Table2 named "Genes" that matches the "Desc" from both tables and adds the Genes from Table1 to get:

  Col1    Desc    Gene
1    1   Sweet    Candy
2    2   Sugar    Candy
3    3 Dentist    Toothpaste
4    4     Red    Apple
5    5   Fruit    Apple
6    6  Fruity    Candy

Upvotes: 12

Views: 12937

Answers (6)

sbha
sbha

Reputation: 10422

Assuming there aren't too many terms to match, here's an option using some tidyverse functions:

library(tidyverse)
crossing(Table1, Table2) %>% 
  mutate_if(is.factor, as.character) %>% 
  rowwise() %>% 
  filter(str_detect(Desc, Desc1)) %>% 
  select(Col1, Desc = Desc1, Genes) %>% 
  arrange(Col1)

# A tibble: 7 x 3
   Col1 Desc    Genes     
  <dbl> <chr>   <chr>     
1     1 Sweet   Candy     
2     2 Sugar   Candy     
3     3 Dentist Toothpaste
4     4 Red     Apple     
5     5 Fruit   Apple     
6     5 Fruit   Candy     
7     6 Fruity  Candy 

Upvotes: 0

Tyler Rinker
Tyler Rinker

Reputation: 109844

If we can get a key lookup into a named list or 2 vectors (e.g., 2 column data frame) we can use the %l% function int he *qdapTools** package I maintain. First I'll split up your Table1$desc into a named list using the strsplit function. That's out key. The we can do the lookup via Table2$Desc. This uses the *data.table** package in the backend so it's pretty speedy:

library(qdapTools)

key <- setNames(strsplit(as.character(Table1[["Desc"]]), "\\s*,\\s*"), Table1[["Genes"]])

## $Apple
## [1] "Healthy" "Red"     "Fruit"  
## 
## $Candy
## [1] "Sweet"    "Cavities" "Sugar"    "Fruity"  
## 
## $Toothpaste
## [1] "Minty"   "Dentist"

Table2[["Gene"]] <- Table2[["Desc"]] %l% key

##   Col1    Desc       Gene
## 1    1   Sweet      Candy
## 2    2   Sugar      Candy
## 3    3 Dentist Toothpaste
## 4    4     Red      Apple
## 5    5   Fruit      Apple
## 6    6  Fruity      Candy

Here's a pure base vector lookup that should also be pretty speedy:

x <- strsplit(as.character(Table1[["Desc"]]), "\\s*,\\s*")
key <- setNames(rep(Table1[["Genes"]], sapply(x, length)), unlist(x))
Table2[["Gene"]] <- key[match(Table2[["Desc"]], names(key))]

Upvotes: 4

Frank
Frank

Reputation: 66819

Following @TylerRinker's answer, I'd format the Table1$Desc string first:

Table1a        <- with(Table1,
                    stack(setNames(sapply(as.character(Desc),strsplit,split=","),Genes)))
names(Table1a) <- c("Desc","Genes")

Then go to data.table:

require(data.table)
DT1 <- data.table(Table1a,key="Desc")
DT2 <- data.table(Table2,key="Desc")

Then merge-n-define:

DT2[DT1,Gene:=Genes]
#    Col1    Desc       Gene
# 1:    3 Dentist Toothpaste
# 2:    5   Fruit      Apple
# 3:    6  Fruity      Candy
# 4:    4     Red      Apple
# 5:    2   Sugar      Candy
# 6:    1   Sweet      Candy

Upvotes: 3

Jthorpe
Jthorpe

Reputation: 10167

Here is a method in base R that uses an intermediate linking table:

# create an intermediate data.frame with all the key (Desc) / value (Gene) pairs
df  <-  NULL
for(i in seq(nrow(Table1)))
    df  <-  rbind(df,
                  data.frame(Gene =Table1$Genes[i],
                            Desc =strsplit(as.character(Table1$Desc)[i],',')[[1]]))
df 
#>         Gene     Desc
#> 1      Apple  Healthy
#> 2      Apple      Red
#> 3      Apple    Fruit
#> 4      Candy    Sweet
#> 5      Candy Cavities
#> 6      Candy    Sugar
#> 7      Candy   Fruity
#> 8 Toothpaste    Minty
#> 9 Toothpaste  Dentist

Now link to it in the usual way:

Table2$Gene  <-  df$Gene[match(Table2$Desc,df$Desc)]

Upvotes: 6

TARehman
TARehman

Reputation: 6749

Assuming that each string is unique (i.e. that Fruit can't appear for more than one Gene), you can do this fairly easily using a for loop and grep. It may be slow on a huge dataset however.

options(stringsAsFactors = FALSE)
Table1 <- data.frame("Random" = c("A", "B", "C"), "Genes" = c("Apple", "Candy", "Toothpaste"), "Extra" = c("Up", "", "Down"), "Desc" = c("Healthy,Red,Fruit", "Sweet,Cavities,Sugar,Fruity", "Minty,Dentist"))
Table2 <- data.frame("Col1" = c(1, 2, 3, 4, 5, 6), "Desc" = c("Sweet", "Sugar", "Dentist", "Red", "Fruit", "Fruity"))

Table2$Gene <- NA
for(x in 1:nrow(Table2)) {

    Table2[x,"Gene"] <- Table1$Genes[grep(pattern = paste("\\b",Table2$Desc[x],"\\b",sep=""),x = Table1$Desc)]
}
Table2

  Col1    Desc       Gene
1    1   Sweet      Candy
2    2   Sugar      Candy
3    3 Dentist Toothpaste
4    4     Red      Apple
5    5   Fruit      Apple
6    6  Fruity      Candy

Upvotes: 3

akrun
akrun

Reputation: 886998

You could try cSplit from splitstackshape to split the 'Desc' column in "Table1" and convert the dataset from 'wide' to 'long' format. The output will be a data.table. We can use the data.table methods for setting the key column as 'Desc' (setkey), join with "Table2", and finally remove the columns that are not needed in the output either by selecting the columns or assigning (:=) the unwanted columns to NULL

library(splitstackshape)
setkey(cSplit(Table1, 'Desc', ',', 'long'),Desc)[Table2[2:1]][
                   ,c(5,4,2), with=FALSE]
#  Col1    Desc      Genes
#1:    1   Sweet      Candy
#2:    2   Sugar      Candy
#3:    3 Dentist Toothpaste
#4:    4     Red      Apple
#5:    5   Fruit      Apple
#6:    6  Fruity      Candy

Upvotes: 8

Related Questions