Reputation: 59
I need to capture some values from dataframe A (colname : "price") and put them in dataframe B ("PRECIO_LISTA"). Same with column "sale_price" from dataframe A, put the value in column "PRECIO_INDEXADO" in dataframe B.
dataframe A
df_a <- structure(list(TIENDA = c("CURACAO", "CURACAO", "CURACAO", "CURACAO",
"CURACAO", "CURACAO", "CURACAO", "CURACAO", "CURACAO"), SKU = c("4896 PG",
"4896 PG", "4896 PG", "4896 PG", "4896 PG", "4896 PG", "4896 PG",
"4896 PG", "4896 PG"), NOMBRE = c("Ventilador 3en1 Air Monster 4896 40cm",
"Ventilador 3en1 Air Monster 4896 40cm", "Ventilador 3en1 Air Monster 4896 40cm",
"Ventilador 3en1 Air Monster 4896 40cm", "Ventilador 3en1 Air Monster 4896 40cm",
"Ventilador 3en1 Air Monster 4896 40cm", "Ventilador 3en1 Air Monster 4896 40cm",
"Ventilador 3en1 Air Monster 4896 40cm", "Ventilador 3en1 Air Monster 4896 40cm"
), PRECIO_OFERTA = c(29, 29, 29, 29, 29, 29, 29, 29, 29), PRECIO_LISTA = c(80,
80, 80, 80, 80, 80, 80, 80, 80), PRECIO_INDEXADO = c(29, 29,
29, 29, 29, 29, 29, 29, 29)), .Names = c("TIENDA", "SKU", "NOMBRE",
"PRECIO_OFERTA", "PRECIO_LISTA", "PRECIO_INDEXADO"), row.names = c(NA,
-9L), class = c("tbl_df", "tbl", "data.frame"))
data frame B
df_b <- structure(list(id = "4896 PG", title = "Ventilador 3en1 Air Monster 4896 40cm",
description = "Tu mejor aliado contra los días de intenso calor... este ventilador 3 en 1 Air Monster.",
google_product_category = NA, link = "https://www.lacuracao.pe/curacao/ventilador-3en1-air-monster-4896-40cm--4896-pg",
image_link = "http://www.lacuracao.pe/wcsstore/efe_cat_as/646x1000/4896 PG_1.jpg",
additional_image_link = NA, availability = 1, price = 80,
sale_price = 49), .Names = c("id", "title", "description",
"google_product_category", "link", "image_link", "additional_image_link",
"availability", "price", "sale_price"), row.names = c(NA, -1L
), class = c("tbl_df", "tbl", "data.frame"))
Desired ouput:
Data Frame B (df_b) with the values from data Frame A. The data provided is just for 1 product with SKU (id): "4896 PG".
Right now, df_b has values: "price" = 80, "sale_price" = 49. But it should be: "price": 80, "sale_price": 29.
I tried this without success:
desired_result <- inner_join(df_b,
df_a,
by = c("id" ="SKU", "price" = "PRECIO_LISTA",
"sale_price" = "PRECIO_INDEXADO"))
Upvotes: 1
Views: 1039
Reputation: 28391
Borrowed from this question, you can use data.table
to do it
library(data.table)
setDT(df_a) # convert to a data.table without copy
setDT(df_b)
# join and update "df" by reference, i.e. without copy
df_b[df_a, on = c("id" = "SKU", "price" = "PRECIO_LISTA"), sale_price := PRECIO_INDEXADO]
df_b
#> id title
#> 1: 4896 PG Ventilador 3en1 Air Monster 4896 40cm
#> description
#> 1: Tu mejor aliado contra los días de intenso calor... este ventilador 3 en 1 Air Monster.
#> google_product_category
#> 1: NA
#> link
#> 1: https://www.lacuracao.pe/curacao/ventilador-3en1-air-monster-4896-40cm--4896-pg
#> image_link
#> 1: http://www.lacuracao.pe/wcsstore/efe_cat_as/646x1000/4896 PG_1.jpg
#> additional_image_link availability price sale_price
#> 1: NA 1 80 29
Edit: to update both price
and sale_price
df_b[df_a, on = .(id = SKU), c("price", "sale_price") := list(PRECIO_LISTA, PRECIO_INDEXADO)]
df_b
Created on 2018-03-16 by the reprex package (v0.2.0).
Upvotes: 0
Reputation: 701
If I understand correctly, you want to replace price
and sale_price
in df_b
with PRECIO_LISTA
and PRECIO_INDEXADO
in df_a
, joining only by id = SKU
.
You can first discard the current price
and sale_price
from df_b
using select()
and then join with df_a
keeping only the variables you are interested into. Moreover you can use distinct()
to avoid duplicates.
Using the pipes:
library(dplyr)
desired_result =
df_b %>% select(-price,-sale_price) %>%
left_join(
df_a %>%
transmute(id = SKU, price = PRECIO_LISTA, sale_price = PRECIO_INDEXADO) %>%
distinct()
)
Upvotes: 1