Reputation: 138
im using R to create a table with data from another table and i'm working with the next variables:
-PRODUCT ID
-CLASIFICATION
-DATE
For example my origin table:
product id Clasification Date
10000567 B+ 12-12-2020
10000123 C+ 26-11-2020
10000567 A+ 02-11-2020
10000222 A+ 09-10-2020
10000123 B++ 21-09-2020
10000222 A++ 10-09-2020
The thing is that i need to get the most recently clasification for my products id's cause is a dynamic field and it can change always. One row for product id.
Any help will be great.
Thanks!
Upvotes: 2
Views: 1746
Reputation: 35554
You can use slice_max()
in dplyr
, which supersedes top_n()
after version 1.0.0
, to select the most recent date.
df %>%
mutate(Date = as.Date(Date, "%d-%m-%Y")) %>%
group_by(product_id) %>%
slice_max(Date, n = 1) %>%
ungroup()
# # A tibble: 3 x 3
# product_id Clasification Date
# <int> <chr> <date>
# 1 10000123 C+ 2020-11-26
# 2 10000222 A+ 2020-10-09
# 3 10000567 B+ 2020-12-12
Data
df <- structure(list(product_id = c(10000567L, 10000123L, 10000567L,
10000222L, 10000123L, 10000222L), Clasification = c("B+", "C+",
"A+", "A+", "B++", "A++"), Date = c("12-12-2020", "26-11-2020",
"02-11-2020", "09-10-2020", "21-09-2020", "10-09-2020")), class = "data.frame", row.names = c(NA, -6L))
Upvotes: 3
Reputation: 3397
Assuming your dates are not sorted, something like the following should work:
library(dplyr)
df %>%
arrange(desc(Date)) %>%
group_by(id) %>%
slice(1) %>%
ungroup()
Upvotes: -1