Rio
Rio

Reputation: 398

How to match values of several variables to a variable in a look up table?

I have two datasets:

loc <- c("a","b","c","d","e")
id1 <- c(NA,9,3,4,5)
id2 <- c(2,3,7,5,6)
id3 <- c(2,NA,5,NA,7)
cost1 <- c(10,20,30,40,50)
cost2 <- c(50,20,30,30,50)
cost3 <- c(40,20,30,10,20)
dt <- data.frame(loc,id1,id2,id3,cost1,cost2,cost3)


id <- c(1,2,3,4,5,6,7)
rate <- c(0.9,0.8,0.7,0.6,0.5,0.4,0.3)
lookupd_tb <- data.frame(id,rate)

what I want to do, is to match the values in dt with lookup_tb for id1,id2 and id3 and if there is a match, multiply rate for that id to its related cost.

This is my approach:

dt <- dt %>% 
left_join(lookupd_tb , by=c("id1"="id")) %>%
dplyr :: mutate(cost1 = ifelse(!is.na(rate), cost1*rate, cost1)) %>% 
dplyr :: select (-rate)

what I am doing now, works fine but I have to repeat it 3 times for each variable and I was wondering if there is a more efficient way to do this(probably using apply family?)

I tried to join all three variables with id in my look up table but when rate is joined with my dt, all the costs (cost1, cost2 and cost3) will be multiply by the same rate which I don't want.

I appreciate your help!

Upvotes: 5

Views: 154

Answers (2)

Roman
Roman

Reputation: 17678

In tidyverse you can also try an alternative approach by transforming the data from wide to long

  library(tidyverse)
  dt %>% 
  # data transformation to long
  gather(k, v, -loc) %>% 
  mutate(ID=paste0("costnew", str_extract(k, "[:digit:]")),
         k=str_remove(k, "[:digit:]")) %>% 
  spread(k, v) %>% 
  # left_join and calculations of new costs
  left_join(lookupd_tb , by="id") %>% 
  mutate(cost_new=ifelse(is.na(rate), cost,rate*cost)) %>% 
  #  clean up and expected output
  select(loc, ID, cost_new) %>% 
  spread(ID, cost_new) %>% 
  left_join(dt,., by="loc")  # or %>% bind_cols(dt, .)
  loc id1 id2 id3 cost1 cost2 cost3 costnew1 costnew2 costnew3
1   a  NA   2   2    10    50    40       10       40       32
2   b   9   3  NA    20    20    20       20       14       20
3   c   3   7   5    30    30    30       21        9       15
4   d   4   5  NA    40    30    10       24       15       10
5   e   5   6   7    50    50    20       25       20        6

The idea ist to bring the data in suitable long format for the lef_joining using a gather & spread combination with new index columns k and ID. After the calculation we will transform to the expected output using a second spread and binding to dt

Upvotes: 3

akrun
akrun

Reputation: 887951

A base R approach would be to loop through the columns of 'id' using sapply/lapply, get the matching index from the 'id' column of 'lookupd_tb', based on the index, get the corresponding 'rate', replace the NA elements with 1, multiply with 'cost' columns and update the 'cost' columns

nmid <- grep("id", names(dt))
nmcost <- grep("cost", names(dt))

dt[nmcost] <- dt[nmcost]*sapply(dt[nmid], function(x) {
         x1 <- lookupd_tb$rate[match(x, lookupd_tb$id)]
          replace(x1, is.na(x1), 1) })

Or using tidyverse, we can loop through the sets of columns i.e. 'id' and 'cost' with purrr::map2, then do the same approach as above. The only diference is that here we created new columns instead of updating the 'cost' columns

library(tidyverse)
dt %>% 
   select(nmid) %>% 
   map2_df(., dt %>% 
               select(nmcost), ~  
                 .x %>% 
                     match(., lookupd_tb$id) %>%
                     lookupd_tb$rate[.] %>% 
                     replace(., is.na(.),1) * .y ) %>%
    rename_all(~ paste0("costnew", seq_along(.))) %>%
    bind_cols(dt, .)

Upvotes: 7

Related Questions