ogondiaz
ogondiaz

Reputation: 59

dplyr - use join as vlookup to capture some value from different data frame

I need to capture some values from dataframe A (colname : "price") and put them in dataframe B ("PRECIO_LISTA"). Same with column "sale_price" from dataframe A, put the value in column "PRECIO_INDEXADO" in dataframe B.

dataframe A

df_a <-  structure(list(TIENDA = c("CURACAO", "CURACAO", "CURACAO", "CURACAO", 
"CURACAO", "CURACAO", "CURACAO", "CURACAO", "CURACAO"), SKU = c("4896 PG", 
"4896 PG", "4896 PG", "4896 PG", "4896 PG", "4896 PG", "4896 PG", 
"4896 PG", "4896 PG"), NOMBRE = c("Ventilador 3en1 Air Monster 4896 40cm", 
"Ventilador 3en1 Air Monster 4896 40cm", "Ventilador 3en1 Air Monster 4896 40cm", 
"Ventilador 3en1 Air Monster 4896 40cm", "Ventilador 3en1 Air Monster 4896 40cm", 
"Ventilador 3en1 Air Monster 4896 40cm", "Ventilador 3en1 Air Monster 4896 40cm", 
"Ventilador 3en1 Air Monster 4896 40cm", "Ventilador 3en1 Air Monster 4896 40cm"
), PRECIO_OFERTA = c(29, 29, 29, 29, 29, 29, 29, 29, 29), PRECIO_LISTA = c(80, 
80, 80, 80, 80, 80, 80, 80, 80), PRECIO_INDEXADO = c(29, 29, 
29, 29, 29, 29, 29, 29, 29)), .Names = c("TIENDA", "SKU", "NOMBRE", 
"PRECIO_OFERTA", "PRECIO_LISTA", "PRECIO_INDEXADO"), row.names = c(NA, 
-9L), class = c("tbl_df", "tbl", "data.frame"))

data frame B

df_b <- structure(list(id = "4896 PG", title = "Ventilador 3en1 Air Monster 4896 40cm", 
    description = "Tu mejor aliado contra los días de intenso calor... este ventilador 3 en 1 Air Monster.", 
    google_product_category = NA, link = "https://www.lacuracao.pe/curacao/ventilador-3en1-air-monster-4896-40cm--4896-pg", 
    image_link = "http://www.lacuracao.pe/wcsstore/efe_cat_as/646x1000/4896 PG_1.jpg", 
    additional_image_link = NA, availability = 1, price = 80, 
    sale_price = 49), .Names = c("id", "title", "description", 
"google_product_category", "link", "image_link", "additional_image_link", 
"availability", "price", "sale_price"), row.names = c(NA, -1L
), class = c("tbl_df", "tbl", "data.frame"))

Desired ouput:

Data Frame B (df_b) with the values from data Frame A. The data provided is just for 1 product with SKU (id): "4896 PG".

Right now, df_b has values: "price" = 80, "sale_price" = 49. But it should be: "price": 80, "sale_price": 29.

I tried this without success:

desired_result <- inner_join(df_b,
                             df_a,
                             by = c("id" ="SKU", "price" = "PRECIO_LISTA",
                                                 "sale_price"  = "PRECIO_INDEXADO"))

Upvotes: 1

Views: 1039

Answers (2)

Tung
Tung

Reputation: 28391

Borrowed from this question, you can use data.table to do it

library(data.table)
setDT(df_a) # convert to a data.table without copy
setDT(df_b)         

# join and update "df" by reference, i.e. without copy 
df_b[df_a, on = c("id" = "SKU", "price" = "PRECIO_LISTA"), sale_price := PRECIO_INDEXADO]
df_b

#>         id                                 title
#> 1: 4896 PG Ventilador 3en1 Air Monster 4896 40cm
#>                                                                                description
#> 1: Tu mejor aliado contra los días de intenso calor... este ventilador 3 en 1 Air Monster.
#>    google_product_category
#> 1:                      NA
#>                                                                               link
#> 1: https://www.lacuracao.pe/curacao/ventilador-3en1-air-monster-4896-40cm--4896-pg
#>                                                            image_link
#> 1: http://www.lacuracao.pe/wcsstore/efe_cat_as/646x1000/4896 PG_1.jpg
#>    additional_image_link availability price sale_price
#> 1:                    NA            1    80         29

Edit: to update both price and sale_price

df_b[df_a, on = .(id = SKU), c("price", "sale_price") := list(PRECIO_LISTA, PRECIO_INDEXADO)]
df_b

Created on 2018-03-16 by the reprex package (v0.2.0).

Upvotes: 0

tspano
tspano

Reputation: 701

If I understand correctly, you want to replace price and sale_price in df_b with PRECIO_LISTA and PRECIO_INDEXADO in df_a, joining only by id = SKU.

You can first discard the current price and sale_price from df_b using select() and then join with df_a keeping only the variables you are interested into. Moreover you can use distinct() to avoid duplicates.

Using the pipes:

library(dplyr)
desired_result = 
  df_b %>% select(-price,-sale_price) %>% 
  left_join(
    df_a %>%
      transmute(id = SKU, price = PRECIO_LISTA, sale_price = PRECIO_INDEXADO) %>%
      distinct()
  )

Upvotes: 1

Related Questions