Reputation: 1548
I have a dataframe that contains the following columns: city
, amenity
, date
and I want to add a column last
based on grouping together city
and amenity
, then taking the most recent date.
input dataframe:
| city | amenity | date |
|-----------|---------------|---------------------|
| rome | stadium | 2020-02-25 19:10:40 |
| new york | concert hall | 2020-03-09 18:15:29 |
| rome | stadium | 2020-02-29 15:07:23 |
| stockholm | swimming pool | 2020-03-02 11:23:54 |
| new york | skate park | 2020-03-12 13:41:35 |
| stockholm | swimming pool | 2020-03-13 17:54:23 |
| stockholm | swimming pool | 2020-03-18 19:18:29 |
desired output:
| city | amenity | date | last |
|-----------|---------------|---------------------|------|
| rome | stadium | 2020-02-25 19:10:40 | |
| new york | concert hall | 2020-03-09 18:15:29 | TRUE |
| rome | stadium | 2020-02-29 15:07:23 | TRUE |
| stockholm | swimming pool | 2020-03-02 11:23:54 | |
| new york | skate park | 2020-03-12 13:41:35 | TRUE |
| stockholm | swimming pool | 2020-03-13 17:54:23 | |
| stockholm | swimming pool | 2020-03-18 19:18:29 | TRUE |
Data:
df <- structure(list(city = c("rome", "newyork", "rome", "stockholm",
"newyork", "stockholm", "stockholm"), amenity = c("stadium",
"concert_hall", "stadium", "swimming_pool", "skate_park", "swimming_pool",
"swimming_pool"), date = structure(c(1582632640, 1583752529,
1582963643, 1583123034, 1583995295, 1584096863, 1584533909), class = c("POSIXct",
"POSIXt"), tzone = "")), row.names = c(NA, -7L), class = "data.frame")
Upvotes: 1
Views: 346
Reputation: 145975
Assuming your data is sorted by date
, df$last = !duplicated(df[, c("city", "amenity")], fromLast = TRUE)
. That will put FALSE
instead of missing values, but should work.
Upvotes: 0
Reputation: 9257
Using dplyr
:
df %>%
group_by(city, amenity) %>%
mutate(
last = if_else(date == max(date), TRUE, NA)
)
I've set NA
instead of FALSE
since in your desired output there is no value in the non-last dates.
Output
# A tibble: 7 x 4
# Groups: city, amenity [4]
city amenity date last
<chr> <chr> <dttm> <lgl>
1 rome stadium 2020-02-25 13:10:40 NA
2 newyork concert_hall 2020-03-09 12:15:29 TRUE
3 rome stadium 2020-02-29 09:07:23 TRUE
4 stockholm swimming_pool 2020-03-02 05:23:54 NA
5 newyork skate_park 2020-03-12 07:41:35 TRUE
6 stockholm swimming_pool 2020-03-13 11:54:23 NA
7 stockholm swimming_pool 2020-03-18 13:18:29 TRUE
Upvotes: 0
Reputation:
Untested since the data cannot easily be copied into R, but something like this.
data %>%
group_by(city, amenity) %>%
mutate(last = (date == max(date)))
Upvotes: 1