Reputation: 398
I have two datasets:
loc <- c("a","b","c","d","e")
id1 <- c(NA,9,3,4,5)
id2 <- c(2,3,7,5,6)
id3 <- c(2,NA,5,NA,7)
cost1 <- c(10,20,30,40,50)
cost2 <- c(50,20,30,30,50)
cost3 <- c(40,20,30,10,20)
dt <- data.frame(loc,id1,id2,id3,cost1,cost2,cost3)
id <- c(1,2,3,4,5,6,7)
rate <- c(0.9,0.8,0.7,0.6,0.5,0.4,0.3)
lookupd_tb <- data.frame(id,rate)
what I want to do, is to match the values in dt with lookup_tb for id1,id2 and id3 and if there is a match, multiply rate for that id to its related cost.
This is my approach:
dt <- dt %>%
left_join(lookupd_tb , by=c("id1"="id")) %>%
dplyr :: mutate(cost1 = ifelse(!is.na(rate), cost1*rate, cost1)) %>%
dplyr :: select (-rate)
what I am doing now, works fine but I have to repeat it 3 times for each variable and I was wondering if there is a more efficient way to do this(probably using apply family?)
I tried to join all three variables with id in my look up table but when rate is joined with my dt, all the costs (cost1, cost2 and cost3) will be multiply by the same rate which I don't want.
I appreciate your help!
Upvotes: 5
Views: 154
Reputation: 17678
In tidyverse
you can also try an alternative approach by transforming the data from wide to long
library(tidyverse)
dt %>%
# data transformation to long
gather(k, v, -loc) %>%
mutate(ID=paste0("costnew", str_extract(k, "[:digit:]")),
k=str_remove(k, "[:digit:]")) %>%
spread(k, v) %>%
# left_join and calculations of new costs
left_join(lookupd_tb , by="id") %>%
mutate(cost_new=ifelse(is.na(rate), cost,rate*cost)) %>%
# clean up and expected output
select(loc, ID, cost_new) %>%
spread(ID, cost_new) %>%
left_join(dt,., by="loc") # or %>% bind_cols(dt, .)
loc id1 id2 id3 cost1 cost2 cost3 costnew1 costnew2 costnew3
1 a NA 2 2 10 50 40 10 40 32
2 b 9 3 NA 20 20 20 20 14 20
3 c 3 7 5 30 30 30 21 9 15
4 d 4 5 NA 40 30 10 24 15 10
5 e 5 6 7 50 50 20 25 20 6
The idea ist to bring the data in suitable long format for the lef_joining
using a gather
& spread
combination with new index columns k
and ID
. After the calculation we will transform to the expected output using a second spread
and binding to dt
Upvotes: 3
Reputation: 887951
A base R
approach would be to loop through the columns of 'id' using sapply/lapply
, get the match
ing index from the 'id' column of 'lookupd_tb', based on the index, get the corresponding 'rate', replace
the NA
elements with 1, multiply with 'cost' columns and update the 'cost' columns
nmid <- grep("id", names(dt))
nmcost <- grep("cost", names(dt))
dt[nmcost] <- dt[nmcost]*sapply(dt[nmid], function(x) {
x1 <- lookupd_tb$rate[match(x, lookupd_tb$id)]
replace(x1, is.na(x1), 1) })
Or using tidyverse
, we can loop through the sets of columns i.e. 'id' and 'cost' with purrr::map2
, then do the same approach as above. The only diference is that here we created new columns instead of updating the 'cost' columns
library(tidyverse)
dt %>%
select(nmid) %>%
map2_df(., dt %>%
select(nmcost), ~
.x %>%
match(., lookupd_tb$id) %>%
lookupd_tb$rate[.] %>%
replace(., is.na(.),1) * .y ) %>%
rename_all(~ paste0("costnew", seq_along(.))) %>%
bind_cols(dt, .)
Upvotes: 7