Reputation: 47
How can I create a consistently column 'id' using two columns? I need to get an index column 'id' by 'Date' and 'Tr' respectively. I have data:
Date Tr
1 20190829 1
2 20190829 1
3 20190829 1
4 20190829 1
5 20190829 2
6 20190829 2
7 20190830 2
8 20190830 2
9 20190830 2
10 20190830 2
11 20190830 1
12 20190830 1
Example data:
data<-data.frame(Date=c(rep(20190829,6),rep(20190830,6)),Tr=c(rep(1,4),rep(2,6),rep(1,2)))
I'm trying to use somthing like this:
data$id <- cumsum(c(TRUE, diff(data$Tr) != 0))
Date Tr Id
1 20190829 1 1
2 20190829 1 1
3 20190829 1 1
4 20190829 1 1
5 20190829 2 2
6 20190829 2 2
7 20190830 2 2
8 20190830 2 2
9 20190830 2 2
10 20190830 2 2
11 20190830 1 3
12 20190830 1 3
But I need a result:
data2<-data.frame(Date=c(rep(20190829,6),rep(20190830,6)),Tr=c(rep(1,4),rep(2,6),rep(1,2)),id=c(rep(1,4),rep(2,2),rep(3,4),rep(4,2)))
Date Tr Id
1 20190829 1 1
2 20190829 1 1
3 20190829 1 1
4 20190829 1 1
5 20190829 2 2
6 20190829 2 2
7 20190830 2 3
8 20190830 2 3
9 20190830 2 3
10 20190830 2 3
11 20190830 1 4
12 20190830 1 4
How can i do it using R? is there any package which would give quick results?
Upvotes: 1
Views: 55
Reputation: 33488
data.table
has a convenience function for this
library(data.table)
data$Id <- rleid(data$Date, data$Tr)
# Date Tr Id
# 1 20190829 1 1
# 2 20190829 1 1
# 3 20190829 1 1
# 4 20190829 1 1
# 5 20190829 2 2
# 6 20190829 2 2
# 7 20190830 2 3
# 8 20190830 2 3
# 9 20190830 2 3
# 10 20190830 2 3
# 11 20190830 1 4
# 12 20190830 1 4
Upvotes: 2
Reputation: 51592
An idea via base R can be,
i1 <- with(data, ave(Tr, Date, FUN = function(i) cumsum(c(TRUE, diff(i) != 0))))
cumsum(c(TRUE, diff(i1) != 0))
#[1] 1 1 1 1 2 2 3 3 3 3 4 4
#or add it to your data frame,
data$new_id <- cumsum(c(TRUE, diff(i1) != 0))
Upvotes: 1