Cláudio Siva
Cláudio Siva

Reputation: 522

How to create a sequence numerical column based on two columns in r?

My dataframe "fsp" as 1702551 obs and 3 variables. It look like this:

tibble [1,702,551 x 3] 
 $ date       : Date[1:1702551], format: "2011-04-12" "2011-04-12" "2011-04-12" ...
 $ wavelength : num [1:1702551] 350 351 352 353 354 355 356 357 358 359 ...
 $ ID         : chr [1:1702551] "c01" "c01" "c01" "c01" ...

Quick explanation of the data: Per each "date" and "ID" I had a spectral data (not shown) throughout the wavelength interval (350 to 2300nm). I want to create a new column "target_ID" with a sequence of repeating numers that increases to the next consecutive number each time date or ID changes. For example for the first ID, "c01" and date "2011-04-12" I will have a column with the number 1 from the wavelength 350 to 2300. The next ID will have the number 2 and so on (along the dataframe "date" changes as well)

Example of what I want to achieve (look "target_ID"):

|date      |wavelength|ID  |target_ID|
|:---------|:---------|:---|:--------|   
|2011-04-12|350       |c01 |1        |
|2011-04-12|351       |c01 |1        |
|2011-04-12|352       |c01 |1        |
|2011-04-12|353       |c01 |1        |
|...…………………|...……………….|....|...…………….|        
|2011-04-12|350       |c03 |2        |
|2011-04-12|351       |c03 |2        |
|...……………..|...……………….|....|...………………|
|2011-04-13|350       |c01 |3        |
|2011-04-13|351       |c01 |3       |

This is the code that I already tried but without success:

fsp<-fsp %>%
group_by(date, ID) %>%
mutate(target_ID, count=n())

Any help will be much appreciatted.

Thank you in advance.

Upvotes: 2

Views: 1772

Answers (1)

bouncyball
bouncyball

Reputation: 10761

This is a perfect use case for the rleid function from the data.table package:

# example data
xx <- rep(Sys.Date(), 5)
xx <- c(xx, xx + lubridate::days(1))
id <- rep(c(1:4), c(2,3,3,2))
dat <- data.frame(date = xx, id = id)

#          date id
# 1  2021-03-29  1
# 2  2021-03-29  1
# 3  2021-03-29  2
# 4  2021-03-29  2
# 5  2021-03-29  2
# 6  2021-03-30  3
# 7  2021-03-30  3
# 8  2021-03-30  3
# 9  2021-03-30  4
# 10 2021-03-30  4

library(data.table)
dat_dt <- as.data.table(dat)
dat_dt[,target_id := rleid(date, id)]

 #          date id target_id
 # 1: 2021-03-29  1         1
 # 2: 2021-03-29  1         1
 # 3: 2021-03-29  2         2
 # 4: 2021-03-29  2         2
 # 5: 2021-03-29  2         2
 # 6: 2021-03-30  3         3
 # 7: 2021-03-30  3         3
 # 8: 2021-03-30  3         3
 # 9: 2021-03-30  4         4
 #10: 2021-03-30  4         4

And here's how you could use %>% and mutate to solve it:

library(tidyverse)
dat %>%
    mutate(target_id = data.table::rleid(date, id))

Upvotes: 2

Related Questions