Paul Chupakhin
Paul Chupakhin

Reputation: 47

Indexing by two columns consistently

How can I create a consistently column 'id' using two columns? I need to get an index column 'id' by 'Date' and 'Tr' respectively. I have data:

Date    Tr
1   20190829    1
2   20190829    1
3   20190829    1
4   20190829    1
5   20190829    2
6   20190829    2
7   20190830    2
8   20190830    2
9   20190830    2
10  20190830    2
11  20190830    1
12  20190830    1

Example data:

data<-data.frame(Date=c(rep(20190829,6),rep(20190830,6)),Tr=c(rep(1,4),rep(2,6),rep(1,2)))

I'm trying to use somthing like this:

data$id <- cumsum(c(TRUE, diff(data$Tr) != 0))

    Date    Tr  Id
1   20190829    1   1
2   20190829    1   1
3   20190829    1   1
4   20190829    1   1
5   20190829    2   2
6   20190829    2   2
7   20190830    2   2
8   20190830    2   2
9   20190830    2   2
10  20190830    2   2
11  20190830    1   3
12  20190830    1   3

But I need a result:

data2<-data.frame(Date=c(rep(20190829,6),rep(20190830,6)),Tr=c(rep(1,4),rep(2,6),rep(1,2)),id=c(rep(1,4),rep(2,2),rep(3,4),rep(4,2)))

    Date    Tr  Id
1   20190829    1   1
2   20190829    1   1
3   20190829    1   1
4   20190829    1   1
5   20190829    2   2
6   20190829    2   2
7   20190830    2   3
8   20190830    2   3
9   20190830    2   3
10  20190830    2   3
11  20190830    1   4
12  20190830    1   4

How can i do it using R? is there any package which would give quick results?

Upvotes: 1

Views: 55

Answers (2)

s_baldur
s_baldur

Reputation: 33488

data.table has a convenience function for this

library(data.table)
data$Id <- rleid(data$Date, data$Tr)

#        Date Tr Id
# 1  20190829  1  1
# 2  20190829  1  1
# 3  20190829  1  1
# 4  20190829  1  1
# 5  20190829  2  2
# 6  20190829  2  2
# 7  20190830  2  3
# 8  20190830  2  3
# 9  20190830  2  3
# 10 20190830  2  3
# 11 20190830  1  4
# 12 20190830  1  4

Upvotes: 2

Sotos
Sotos

Reputation: 51592

An idea via base R can be,

i1 <- with(data, ave(Tr, Date, FUN = function(i) cumsum(c(TRUE, diff(i) != 0))))

cumsum(c(TRUE, diff(i1) != 0))
#[1] 1 1 1 1 2 2 3 3 3 3 4 4

#or add it to your data frame,

data$new_id <- cumsum(c(TRUE, diff(i1) != 0))

Upvotes: 1

Related Questions