New column in R which is cumulative counting (adding up) unique dates

Question

I have a question regarding creating a new specific variable in a large dataset (36000 rows). I want to create a variable that is cumulative counting the days. So for example:

Date (dd/mm/yyyy)
01-01-2018
01-01-2018
02-01-2018
02-01-2018
02-01-2018
02-01-2018
03-01-2018
03-01-2018

I want to create the following numeric variable which counts the unique days in order, so:

New Variable
1
1
2
2
2
2
3
3

So, the first unique date gets a 1, the second unique date a 2, etc. In total I have more than 36000 observations, where there are 1096 unique dates (three years). So I want a variable that is having a 1 for the first date, a two for the second date, and a 1096 for the last date. However, the dates do not appear equally: one unique date appears maybe 30 times, the other unique date 50 times.

How can I create a variable like this in R?

r2evans · Accepted Answer

Two methods:

if you are certain that they are ordered correctly, then you can use:
```
cumsum(c(TRUE, diff(x$Date) != 0))
# [1] 1 2 2 3 4 4 5 5
```

if you are not certain or just want to guard against it, you can borrow from how factors are stored:

# randomize the data for this example
set.seed(2)
x <- x[sample(seq_len(nrow(x))),,drop=FALSE]
x
#         Date
# 2 2018-01-01
# 5 2018-01-02
# 4 2018-01-02
# 1 2018-01-01
# 6 2018-01-02
# 3 2018-01-02
# 7 2018-01-03
# 8 2018-01-03

# this is the real work, works even if ordered
x$NewVar <- as.integer(factor(x$Date))
x
#         Date NewVar
# 2 2018-01-01      1
# 5 2018-01-02      2
# 4 2018-01-02      2
# 1 2018-01-01      1
# 6 2018-01-02      2
# 3 2018-01-02      2
# 7 2018-01-03      3
# 8 2018-01-03      3

# reorder for presentation here:
x[order(x$NewVar),]
#         Date NewVar
# 2 2018-01-01      1
# 1 2018-01-01      1
# 5 2018-01-02      2
# 4 2018-01-02      2
# 6 2018-01-02      2
# 3 2018-01-02      2
# 7 2018-01-03      3
# 8 2018-01-03      3

Data:

x <- read.table(stringsAsFactors=FALSE, header=TRUE, text="
Date
01-01-2018
01-01-2018
02-01-2018
02-01-2018
02-01-2018
02-01-2018
03-01-2018
03-01-2018")
x$Date <- as.Date(x$Date, format = "%d-%m-%Y")

New column in R which is cumulative counting (adding up) unique dates

Answers (1)

Related Questions