Reputation: 195
I have a dataframe that has been sorted by userid and date as below:
df <- data.frame(
userid = c("1", "1", "1", "2", "2", "3", "3"),
date = c("2016-10-23", "2018-01-01", "2020-03-03", "2008-09-04", "2021-08-08", "2019-10-02", "2022-01-02")
)
userid date
1 2016-10-23
1 2018-01-01
1 2020-03-03
2 2008-09-04
2 2021-08-08
3 2019-10-02
3 2022-01-02
I want to group by userid and assign row numbers, BUT anything before 2017 should be row number 0:
df2 <- data.frame(
userid = c("1", "1", "1", "2", "2", "3", "3"),
date = c("2016-10-23", "2018-01-01", "2020-03-03", "2008-09-04", "2021-08-08", "2019-10-02", "2022-01-02"),
rownumber = c("0", "1", "2", "0", "1", "1", "2")
)
userid date rownumber
1 2016-10-23 0
1 2018-01-01 1
1 2020-03-03 2
2 2008-09-04 0
2 2021-08-08 1
3 2019-10-02 1
3 2022-01-02 2
As you can see, the dates with years 2016 and 2008 have been assigned as 0 and the rownumber increments from 1 excluding those.
I have tried the below, but it does not assign 0 to dates before 2017.
df %>%
group_by(userid) %>%
mutate(rownumber = row_number()) %>%
ungroup()
Thanks!
Upvotes: 2
Views: 1073
Reputation: 389265
You can extract the year from the date. For each userid
subtract the row number by number of values that are <=
2017.
library(dplyr)
df %>%
mutate(date = as.Date(date),
year = lubridate::year(date)) %>%
arrange(userid, date) %>%
group_by(userid) %>%
mutate(rownumber = pmax(row_number() - sum(year <= 2017), 0)) %>%
ungroup
# userid date year rownumber
# <chr> <date> <dbl> <int>
#1 1 2016-10-23 2016 0
#2 1 2018-01-01 2018 1
#3 1 2020-03-03 2020 2
#4 2 2008-09-04 2008 0
#5 2 2021-08-08 2021 1
#6 3 2019-10-02 2019 1
#7 3 2022-01-02 2022 2
Upvotes: 2
Reputation: 160952
df %>%
mutate(date = as.Date(date)) %>%
group_by(userid, year = (1900+as.POSIXlt(date)$year) < 2018) %>%
mutate(rownumber = if (year[1]) 0 else order(date)) %>%
ungroup() %>%
select(-year)
# # A tibble: 7 x 3
# userid date rownumber
# <chr> <date> <dbl>
# 1 1 2016-10-23 0
# 2 1 2018-01-01 1
# 3 1 2020-03-03 2
# 4 2 2008-09-04 0
# 5 2 2021-08-08 1
# 6 3 2019-10-02 1
# 7 3 2022-01-02 2
I could have used row_number()
in place of order(date)
, but I didn't know for certain if your data was known to be pre-sorted. You can use row_number()
for the same results here.
(FYI, your date
is a string, so I also converted it to a proper Date
class, for better or worse. If you don't want to do that, you can replace the first mutate
and the (1900+...)<2018
with as.integer(substring(date,1,4)) < 2018
for the same effect.)
Upvotes: 0