Reputation: 3755
I'm a newbie to R & trying to solve this (coming from a SQL background ) .
In the data below - I want to add another column to items ( ideally in a new matrix new_items ) that represents the product data and gets the most recent sale from the sales matrix - and shows N/A if there are no sales. Note that items are unique.
items:
id Sku Name
1 code1 Product1
2 code2 Product2
3 code3 Product3
sales
saleid itemid date qty
1001 1 01-Jan-2016 1
1002 1 01-Feb-2016 1
1003 2 01-Dec-2016 2
new_items
Sku Name LastSale
code1 Product1 01-Feb-2016
code2 Product2 01-Dec-2016
code3 Product3 N/A
Upvotes: 0
Views: 41
Reputation: 3504
Something like this should work (but using data frames instead):
library(dplyr)
library(lubridate)
sales$date <- dmy(sales$date)
both <- items %>% left_join(sales, by = c("id" = "itemid"))
new_items <- both %>% group_by(id) %>% summarise(maxdate = max(date))
id maxdate
<int> <date>
1 1 2016-02-01
2 2 2016-12-01
3 3 <NA>
Or:
sales2 <- sales %>% group_by(itemid) %>% summarise(maxdate = max(date))
items %>% left_join(sales2, by = c("id" = "itemid"))
id Sku Name maxdate
1 1 code1 Product1 2016-02-01
2 2 code2 Product2 2016-12-01
3 3 code3 Product3 <NA>
Data:
items <- read.table(text= "id Sku Name
1 code1 Product1
2 code2 Product2
3 code3 Product3", stringsAsFactors=TRUE, header = TRUE)
sales <- read.table(text= "saleid itemid date qty
1001 1 01-Jan-2016 1
1002 1 01-Feb-2016 1
1003 2 01-Dec-2016 2", stringsAsFactors=TRUE, header = TRUE)
Upvotes: 1
Reputation: 4378
library(dplyr)
# convert the text date to a date to sort correctly
sales$ sale_date <- dmy(sales$date)
# find the latest single sale for each item
latest_sales <- sales %>%
group_by(itemid) %>%
top_n(1, sale_date) %>%
rename(LastSale = sale_date)
# join items with latest sale
new_items <- items %>%
left_join(latest_sales, by = c("id" = "itemid")) %>%
select(Sku, Name, LastSale)
# Sku Name LastSale
# 1 code1 Product1 2016-02-01
# 2 code2 Product2 2016-12-01
# 3 code3 Product3 <NA>
Upvotes: 0