Reputation: 4243
I have a dataframe as follows:
Date Pulled Date Col3 Col4
2019-01-19 2019-01-17 8 9
2019-01-19 2019-01-18 14 9
2019-01-20 2019-01-18 8 0
2019-01-20 2019-01-18 15 14
2019-01-18 2019-01-17 18 7
I want to write the logic stating -
Whenever Date Pulled
has a different value AND given that Date Pulled
values are different, the corresponding Date
values are the same in that column, I want to only keep the row with the max value from Date Pulled
.
Date Pulled Date Col3 Col4
2019-01-19 2019-01-17 8 9
2019-01-20 2019-01-18 8 0
2019-01-20 2019-01-18 15 14
For context, I am pulling in data daily that has 7 days everytime. If I rbind the results together, there are going to be duplicate dates (hence duplicate Date columns). I want to only keep the most recent report I pull in hence the max Date Pulled.
Upvotes: 1
Views: 785
Reputation: 886938
Assuming the 'Col1' and 'Col2' are Date
class, grouped by 'Col2' and filter
the rows where the 'Col1' is equal to max
of 'Col1'
library(dplyr)
df1 %>%
group_by(Col2) %>%
filter((Col1 == max(Col1) )
# A tibble: 3 x 4
# Groups: Col2 [2]
# Col1 Col2 Col3 Col4
# <date> <date> <int> <int>
#1 2019-01-19 2019-01-17 8 9
#2 2019-01-20 2019-01-18 8 0
#3 2019-01-20 2019-01-18 15 14
df1 <- structure(list(Col1 = structure(c(17915, 17915, 17916, 17916,
17914), class = "Date"), Col2 = structure(c(17913, 17914, 17914,
17914, 17913), class = "Date"), Col3 = c(8L, 14L, 8L, 15L, 18L
), Col4 = c(9L, 9L, 0L, 14L, 7L)), row.names = c(NA, -5L), class = "data.frame")
Upvotes: 3
Reputation: 66819
I want to only keep the most recent report I pull in hence the max Date Pulled.
This seems to work:
inner_join(
DT,
DT %>% group_by(Date) %>% summarise(Pulled = max(Pulled))
)
Joining, by = c("Pulled", "Date")
Pulled Date Col3 Col4
1 2019-01-19 2019-01-17 8 9
2 2019-01-20 2019-01-18 8 0
3 2019-01-20 2019-01-18 15 14
where
DT = structure(list(Pulled = c("2019-01-19", "2019-01-19", "2019-01-20",
"2019-01-20", "2019-01-18"), Date = c("2019-01-17", "2019-01-18",
"2019-01-18", "2019-01-18", "2019-01-17"), Col3 = c(8L, 14L,
8L, 15L, 18L), Col4 = c(9L, 9L, 0L, 14L, 7L)), row.names = c(NA,
-5L), class = "data.frame")
(That is, I did not bother converting to date class.)
Upvotes: 1