M.E. Baake
M.E. Baake

Reputation: 23

New column in R which is cumulative counting (adding up) unique dates

I have a question regarding creating a new specific variable in a large dataset (36000 rows). I want to create a variable that is cumulative counting the days. So for example:

Date (dd/mm/yyyy)
01-01-2018
01-01-2018
02-01-2018
02-01-2018
02-01-2018
02-01-2018
03-01-2018
03-01-2018

I want to create the following numeric variable which counts the unique days in order, so:

New Variable
1
1
2
2
2
2
3
3

So, the first unique date gets a 1, the second unique date a 2, etc. In total I have more than 36000 observations, where there are 1096 unique dates (three years). So I want a variable that is having a 1 for the first date, a two for the second date, and a 1096 for the last date. However, the dates do not appear equally: one unique date appears maybe 30 times, the other unique date 50 times.

How can I create a variable like this in R?

Upvotes: 2

Views: 81

Answers (1)

r2evans
r2evans

Reputation: 160852

Two methods:

  • if you are certain that they are ordered correctly, then you can use:

    cumsum(c(TRUE, diff(x$Date) != 0))
    # [1] 1 2 2 3 4 4 5 5
    
  • if you are not certain or just want to guard against it, you can borrow from how factors are stored:

    # randomize the data for this example
    set.seed(2)
    x <- x[sample(seq_len(nrow(x))),,drop=FALSE]
    x
    #         Date
    # 2 2018-01-01
    # 5 2018-01-02
    # 4 2018-01-02
    # 1 2018-01-01
    # 6 2018-01-02
    # 3 2018-01-02
    # 7 2018-01-03
    # 8 2018-01-03
    
    # this is the real work, works even if ordered
    x$NewVar <- as.integer(factor(x$Date))
    x
    #         Date NewVar
    # 2 2018-01-01      1
    # 5 2018-01-02      2
    # 4 2018-01-02      2
    # 1 2018-01-01      1
    # 6 2018-01-02      2
    # 3 2018-01-02      2
    # 7 2018-01-03      3
    # 8 2018-01-03      3
    
    # reorder for presentation here:
    x[order(x$NewVar),]
    #         Date NewVar
    # 2 2018-01-01      1
    # 1 2018-01-01      1
    # 5 2018-01-02      2
    # 4 2018-01-02      2
    # 6 2018-01-02      2
    # 3 2018-01-02      2
    # 7 2018-01-03      3
    # 8 2018-01-03      3
    

Data:

x <- read.table(stringsAsFactors=FALSE, header=TRUE, text="
Date
01-01-2018
01-01-2018
02-01-2018
02-01-2018
02-01-2018
02-01-2018
03-01-2018
03-01-2018")
x$Date <- as.Date(x$Date, format = "%d-%m-%Y")

Upvotes: 2

Related Questions