Martin Thompson
Martin Thompson

Reputation: 3755

R: Find Min Date from Another Matrix

I'm a newbie to R & trying to solve this (coming from a SQL background ) .

In the data below - I want to add another column to items ( ideally in a new matrix new_items ) that represents the product data and gets the most recent sale from the sales matrix - and shows N/A if there are no sales. Note that items are unique.

items:

id    Sku    Name
1     code1  Product1
2     code2  Product2
3     code3  Product3

sales

saleid    itemid    date         qty
1001      1         01-Jan-2016  1
1002      1         01-Feb-2016  1
1003      2         01-Dec-2016  2


new_items
Sku    Name       LastSale
code1  Product1   01-Feb-2016
code2  Product2   01-Dec-2016
code3  Product3   N/A

Upvotes: 0

Views: 41

Answers (2)

Edgar Santos
Edgar Santos

Reputation: 3504

Something like this should work (but using data frames instead):

library(dplyr)
library(lubridate)

sales$date <- dmy(sales$date)
both <- items %>% left_join(sales, by = c("id" = "itemid"))
new_items <- both %>% group_by(id) %>%  summarise(maxdate = max(date))

     id    maxdate
  <int>     <date>
1     1 2016-02-01
2     2 2016-12-01
3     3       <NA>

Or:

sales2 <- sales %>% group_by(itemid) %>%  summarise(maxdate = max(date))
items %>% left_join(sales2, by = c("id" = "itemid"))

  id   Sku     Name    maxdate
1  1 code1 Product1 2016-02-01
2  2 code2 Product2 2016-12-01
3  3 code3 Product3       <NA>

Data:

items <- read.table(text= "id    Sku    Name
    1     code1  Product1
    2     code2  Product2
    3     code3  Product3", stringsAsFactors=TRUE, header = TRUE)

sales <- read.table(text= "saleid    itemid    date         qty
    1001      1         01-Jan-2016  1
    1002      1         01-Feb-2016  1
    1003      2         01-Dec-2016  2", stringsAsFactors=TRUE, header = TRUE)

Upvotes: 1

Andrew Lavers
Andrew Lavers

Reputation: 4378

library(dplyr)
# convert the text date to a date to sort correctly
sales$ sale_date <- dmy(sales$date)

# find the latest single sale for each item
latest_sales <- sales %>%
  group_by(itemid) %>%
  top_n(1, sale_date) %>%
  rename(LastSale = sale_date)

# join items with latest sale
new_items <- items %>%
  left_join(latest_sales, by = c("id" = "itemid")) %>%
  select(Sku, Name, LastSale)

  #     Sku     Name   LastSale
  # 1 code1 Product1 2016-02-01
  # 2 code2 Product2 2016-12-01
  # 3 code3 Product3       <NA>

Upvotes: 0

Related Questions