Reputation: 522
My dataframe "fsp" as 1702551 obs and 3 variables. It look like this:
tibble [1,702,551 x 3]
$ date : Date[1:1702551], format: "2011-04-12" "2011-04-12" "2011-04-12" ...
$ wavelength : num [1:1702551] 350 351 352 353 354 355 356 357 358 359 ...
$ ID : chr [1:1702551] "c01" "c01" "c01" "c01" ...
Quick explanation of the data: Per each "date" and "ID" I had a spectral data (not shown) throughout the wavelength interval (350 to 2300nm). I want to create a new column "target_ID" with a sequence of repeating numers that increases to the next consecutive number each time date or ID changes. For example for the first ID, "c01" and date "2011-04-12" I will have a column with the number 1 from the wavelength 350 to 2300. The next ID will have the number 2 and so on (along the dataframe "date" changes as well)
Example of what I want to achieve (look "target_ID"):
|date |wavelength|ID |target_ID|
|:---------|:---------|:---|:--------|
|2011-04-12|350 |c01 |1 |
|2011-04-12|351 |c01 |1 |
|2011-04-12|352 |c01 |1 |
|2011-04-12|353 |c01 |1 |
|...…………………|...……………….|....|...…………….|
|2011-04-12|350 |c03 |2 |
|2011-04-12|351 |c03 |2 |
|...……………..|...……………….|....|...………………|
|2011-04-13|350 |c01 |3 |
|2011-04-13|351 |c01 |3 |
This is the code that I already tried but without success:
fsp<-fsp %>%
group_by(date, ID) %>%
mutate(target_ID, count=n())
Any help will be much appreciatted.
Thank you in advance.
Upvotes: 2
Views: 1772
Reputation: 10761
This is a perfect use case for the rleid
function from the data.table
package:
# example data
xx <- rep(Sys.Date(), 5)
xx <- c(xx, xx + lubridate::days(1))
id <- rep(c(1:4), c(2,3,3,2))
dat <- data.frame(date = xx, id = id)
# date id
# 1 2021-03-29 1
# 2 2021-03-29 1
# 3 2021-03-29 2
# 4 2021-03-29 2
# 5 2021-03-29 2
# 6 2021-03-30 3
# 7 2021-03-30 3
# 8 2021-03-30 3
# 9 2021-03-30 4
# 10 2021-03-30 4
library(data.table)
dat_dt <- as.data.table(dat)
dat_dt[,target_id := rleid(date, id)]
# date id target_id
# 1: 2021-03-29 1 1
# 2: 2021-03-29 1 1
# 3: 2021-03-29 2 2
# 4: 2021-03-29 2 2
# 5: 2021-03-29 2 2
# 6: 2021-03-30 3 3
# 7: 2021-03-30 3 3
# 8: 2021-03-30 3 3
# 9: 2021-03-30 4 4
#10: 2021-03-30 4 4
And here's how you could use %>%
and mutate
to solve it:
library(tidyverse)
dat %>%
mutate(target_id = data.table::rleid(date, id))
Upvotes: 2