Reputation: 543
I have two datasets and their dput is below:
Scenario: df1 contains data on unitprices for certain dates and times. df2 contains the competitor for each station code in df1 along with prices and competitor brand name. df2 does not have all the unit prices for dates and times in df1. I need to somehow merge df2 in df1 such that I get date-time, Product, ServiceType & Brand but NA in unitprices for competitors so I can impute these values after.
What I am trying to do is as follows:
Find out which date-time unit prices are missing in df2 based on values in df1 conditioned on similar Station Code, Product & Service Type. Each station code in df1 has multiple "stationscode" in df2 as there can be be more than one competitor.
Somehow extract and merge this information so I can impute values after.
Let's suppose df1 is
date StationCode ServiceType Product unitprice
1 2017-06-01 06:00:00 1002 Self-Service GG 1.345
2 2017-06-01 07:00:00 1002 Served GG 1.456
3 2017-06-01 07:00:00 1002 Self-Service SP 1.012
dput(df1)
structure(list(date = structure(c(1496300400,
1496304000, 1496304000 ), class = c("POSIXct", "POSIXt"), tzone =
"Etc/GMT+1"), StationCode = c(1002, 1002, 1002), ServiceType =
structure(c(1L, 2L, 1L), .Label = c("Self-Service", "Served"), class
= "factor"), Product = structure(c(1L, 1L, 2L ), .Label = c("GG", "SP"), class = "factor"), unitprice = c(1.345,
1.456, 1.012)), class = "data.frame", row.names = c(NA, -3L))
and df2 is
date compstcode StationCode ServiceType Product unitprice brand
1 2017-06-01 06:00:00 3456 1002 Self-Service GG 1.425 Shell
2 2017-06-01 06:00:00 1267 1002 Served SP 1.406 BP
3 2017-06-01 06:00:00 5488 1002 Self-Service GG 1.011 Total
dput(df2)
structure(list(date = structure(c(1496300400, 1496300400, 1496300400
), class = c("POSIXct", "POSIXt"), tzone = "Etc/GMT+1"), compstcode = c(3456,
1267, 5488), StationCode = c(1002, 1002, 1002), ServiceType = structure(c(1L,
2L, 1L), .Label = c("Self-Service", "Served"), class = "factor"),
Product = structure(c(1L, 2L, 1L), .Label = c("GG", "SP"), class = "factor"),
unitprice = c(1.425, 1.406, 1.011), brand = structure(c(2L,
1L, 3L), .Label = c("BP", "Shell", "Total"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L))
I want to extract compstcode, uprice and brand from df2 into df1 such that we have the compstcode and brand not null whereas uprice can be null for all dates and times of df1.
Upvotes: 0
Views: 42
Reputation: 7292
I think you're essentially looking to join these two data frames based on multiple criteria, but let me know if that doesn't capture what you're trying to do.
library(tidyverse)
df1 %>% left_join(., df2 %>% select(-date), by = c("StationCode", "ServiceType", "Product"))
date StationCode ServiceType Product unitprice.x compstcode unitprice.y brand
1 2017-06-01 06:00:00 1002 Self-Service GG 1.345 3456 1.425 Shell
2 2017-06-01 06:00:00 1002 Self-Service GG 1.345 5488 1.011 Total
3 2017-06-01 07:00:00 1002 Served GG 1.456 NA NA <NA>
4 2017-06-01 07:00:00 1002 Self-Service SP 1.012 NA NA <NA>
But you don't have to exclude date, it could similarly be (renaming columns for clarity):
df1 %>% left_join(., df2 %>% rename(compDate = date, compunitprice = unitprice), by = c("StationCode", "ServiceType", "Product"))
Upvotes: 1