ayush varshney
ayush varshney

Reputation: 537

Look up of comma separated data in another table

I do have 2 tables where in the table data frame consist data with comma separated values.

Need to find out which category present in lookup data frame i.e. in Option 1 out of "Cat A, Cat B, Cat C" which category is present. if found, need to show the score against it.

incase multiple instance found, we need to show the score of first occurrence hence in first case Cat C and Cat Z both comes under Option 1 however we need to show the score for Cat C only which is 2.

Table

Option           | Cat 
-----             ------   
Option 1         | Cat A, Cat B, Cat C, Cat Z
Option 2         | Cat X, Cat Y, Cat Z, Cat B
Option 3         | Cat P, Cat Q, Cat R, Cat S, Cat T 
Option 1         | Cat T 
Option 3         | Cat E, Cat F 

Lookup

Cat   |   Option    |   Score

Cat A |   Option 2   |   8
Cat B |   Option 3   |   3
Cat C |   Option 1   |   2
Cat X |   Option 3   |   9
Cat Y |   Option 1   |   1
Cat Z |   Option 2   |   1
Cat P |   Option 3   |   2
Cat Q |   Option 2   |   9
Cat R |   Option 1   |   4
Cat S |   Option 4   |   0
Cat T |   Option 1   |   5
Cat E |   Option 4   |   1
Cat F |   Option 3   |   10

Output

 Option      |     Cat                                |     Found_cat    |     Score
 Option 1    |     Cat A, Cat B, Cat C                |     Cat C        |     2
 Option 2    |     Cat X, Cat Y, Cat Z, Cat B          |     Cat Z        |     1
 Option 3    |     Cat P, Cat Q, Cat R, Cat S, Cat T  |     Cat P        |     2
 Option 1    |     Cat T                              |     Cat T        |     5
 Option 3    |     Cat E, Cat F                       |     Cat F        |     10

R Code for creating dataframe

Table <- data.frame(Option = c("Option 1", "Option 2", "Option 3", "Option 1", "Option 3"),
                 Cat = c("Cat A, Cat B, Cat C", "Cat X, Cat Y,Cat Z, Cat B", "Cat P, Cat Q, Cat R, Cat S, Cat T", "Cat T", "Cat E, Cat F"))

lookup <-data.frame(Cat = c("Cat A", "Cat B", "Cat C", "Cat X", "Cat Y", "Cat Z", "Cat P", "Cat Q", "Cat R", "Cat S", "Cat T", "Cat E", "Cat F"),
                    Options = c("Option 2", "Option 3", "Option 1", "Option 3", "Option 1", "Option 2", "Option 3", "Option 2", "Option 1", "Option 4", "Option 1", "Option 4", "Option 3"),
                    Score = c(8, 3, 2, 9, 1, 1, 2, 9, 4, 0, 5, 1, 10)) 



output <- data.frame(Option = c("Option 1", "Option 2", "Option 3", "Option 1", "Option 3"),
                     Cat = c("Cat A, Cat B, Cat C", "Cat X, Cat Y, Cat Z, Cat B,", "Cat P, Cat Q, Cat R, Cat S, Cat T", "Cat T", "Cat E, Cat F"),
                     Found_cat = c("Cat C", "Cat Z", "Cat P", "Cat T", "Cat F"),
                     Score = c(2, 1, 2, 5, 10))

Upvotes: 2

Views: 61

Answers (2)

Sotos
Sotos

Reputation: 51592

Here is a tidyverse option,

library(tidyverse)

Table %>% 
  unnest(Cat = strsplit(as.character(Cat), ', ')) %>% 
  inner_join(lookup, by = c('Option', 'Cat')) %>%
  select(Cat, Score) %>% 
  rename(Cat_Found = Cat) %>% 
  bind_cols(Table, .)

#    Option                               Cat Cat_Found Score
#1 Option 1               Cat A, Cat B, Cat C     Cat C     2
#2 Option 2        Cat X, Cat Y, Cat Z, Cat B     Cat Z     1
#3 Option 3 Cat P, Cat Q, Cat R, Cat S, Cat T     Cat P     2
#4 Option 1                             Cat T     Cat T     5
#5 Option 3                      Cat E, Cat F     Cat F    10

Upvotes: 1

Sean Lin
Sean Lin

Reputation: 815

I just give a quick try using loop:

Table <- data.frame(Option = c("Option 1", "Option 2", "Option 3", "Option 1", "Option 3"),
                Cat = c("Cat A, Cat B, Cat C", "Cat X, Cat Y, Cat Z, Cat B", "Cat P, Cat Q, Cat R, Cat S, Cat T", "Cat T", "Cat E, Cat F"), 
                stringsAsFactors = F)

lookup <-data.frame(Cat = c("Cat A", "Cat B", "Cat C", "Cat X", "Cat Y", "Cat Z", "Cat P", "Cat Q", "Cat R", "Cat S", "Cat T", "Cat E", "Cat F"),
                Options = c("Option 2", "Option 3", "Option 1", "Option 3", "Option 1", "Option 2", "Option 3", "Option 2", "Option 1", "Option 4", "Option 1", "Option 4", "Option 3"),
                Score = c(8, 3, 2, 9, 1, 1, 2, 9, 4, 0, 5, 1, 10), 
                stringsAsFactors = F)
app = matrix(nrow = nrow(Table), ncol = 2)
for (i in 1:nrow(Table)) {
lookup.cats = lookup$Cat[lookup$Options == Table$Option[i]]
Table.cats = unlist(strsplit(Table$Cat[i], split = ', '))
found.cat = intersect(lookup.cats, Table.cats)
score = lookup$Score[which(lookup$Cat == found.cat & 
                               lookup$Options == Table$Option[i])]
app[i, 1] = found.cat
app[i, 2] = score
}
app = as.data.frame(app)
names(app) = c('Found_cat', 'Score')
cbind(Table, app)

Upvotes: 2

Related Questions