sanna
sanna

Reputation: 1548

dplyr assign value to column based on multiple filtering

I have a dataframe that contains the following columns: city, amenity, date and I want to add a column last based on grouping together city and amenity, then taking the most recent date.

input dataframe:

| city      | amenity       | date                |
|-----------|---------------|---------------------|
| rome      | stadium       | 2020-02-25 19:10:40 | 
| new york  | concert hall  | 2020-03-09 18:15:29 |
| rome      | stadium       | 2020-02-29 15:07:23 |
| stockholm | swimming pool | 2020-03-02 11:23:54 |
| new york  | skate park    | 2020-03-12 13:41:35 |
| stockholm | swimming pool | 2020-03-13 17:54:23 |
| stockholm | swimming pool | 2020-03-18 19:18:29 |

desired output:

| city      | amenity       | date                | last |
|-----------|---------------|---------------------|------|
| rome      | stadium       | 2020-02-25 19:10:40 |      |
| new york  | concert hall  | 2020-03-09 18:15:29 | TRUE |
| rome      | stadium       | 2020-02-29 15:07:23 | TRUE |
| stockholm | swimming pool | 2020-03-02 11:23:54 |      |
| new york  | skate park    | 2020-03-12 13:41:35 | TRUE |
| stockholm | swimming pool | 2020-03-13 17:54:23 |      |
| stockholm | swimming pool | 2020-03-18 19:18:29 | TRUE |

Data:

df <- structure(list(city = c("rome", "newyork", "rome", "stockholm", 
"newyork", "stockholm", "stockholm"), amenity = c("stadium", 
"concert_hall", "stadium", "swimming_pool", "skate_park", "swimming_pool", 
"swimming_pool"), date = structure(c(1582632640, 1583752529, 
1582963643, 1583123034, 1583995295, 1584096863, 1584533909), class = c("POSIXct", 
"POSIXt"), tzone = "")), row.names = c(NA, -7L), class = "data.frame")

Upvotes: 1

Views: 346

Answers (3)

Gregor Thomas
Gregor Thomas

Reputation: 145975

Assuming your data is sorted by date, df$last = !duplicated(df[, c("city", "amenity")], fromLast = TRUE). That will put FALSE instead of missing values, but should work.

Upvotes: 0

Ric S
Ric S

Reputation: 9257

Using dplyr:

df %>%
  group_by(city, amenity) %>% 
  mutate(
    last = if_else(date == max(date), TRUE, NA)
  )

I've set NA instead of FALSE since in your desired output there is no value in the non-last dates.


Output

# A tibble: 7 x 4
# Groups:   city, amenity [4]
  city      amenity       date                last 
  <chr>     <chr>         <dttm>              <lgl>
1 rome      stadium       2020-02-25 13:10:40 NA   
2 newyork   concert_hall  2020-03-09 12:15:29 TRUE 
3 rome      stadium       2020-02-29 09:07:23 TRUE 
4 stockholm swimming_pool 2020-03-02 05:23:54 NA   
5 newyork   skate_park    2020-03-12 07:41:35 TRUE 
6 stockholm swimming_pool 2020-03-13 11:54:23 NA   
7 stockholm swimming_pool 2020-03-18 13:18:29 TRUE 

Upvotes: 0

user10917479
user10917479

Reputation:

Untested since the data cannot easily be copied into R, but something like this.

data %>%
  group_by(city, amenity) %>%
  mutate(last = (date == max(date)))

Upvotes: 1

Related Questions