stefano
stefano

Reputation: 415

generate sequence of numbers in R according to other variables

I have problem to generate a sequence of number according on two other variables. Specifically, I have the following DB (my real DB is not so balanced!):

ID1=rep((1:1),20)
ID2=rep((2:2),20)
ID3=rep((3:3),20)
ID<-c(ID1,ID2,ID3)
DATE1=rep("2013-1-1",10)
DATE2=rep("2013-1-2",10)
DATE=c(DATE1,DATE2)
IN<-data.frame(ID,DATE=rep(DATE,3))

and I would like to generate a sequence of number according to the number of observation per each ID for each DATE, like this:

OUTPUT<-data.frame(ID,DATE=rep(DATE,3),N=rep(rep(seq(1:10),2),3))

Curiously, I try the following solution that works for the DB provided above, but not for the real DB!

IN$UNIQUE<-with(IN,as.numeric(interaction(IN$ID,IN$DATE,drop=TRUE,lex.order=TRUE)))#generate unique value for the combination of id and date
PROG<-tapply(IN$DATE,IN$UNIQUE,seq)#generate the sequence
OUTPUT$SEQ<-c(sapply(PROG,"["))#concatenate the sequence in just one vector

Right now, I can not understand why the solution doesn't work for the real DB, as always any tips is greatly appreciated!

Here there is an example (just one ID included) of the data-set:

  id       date
  1  F2_G 2005-03-09
  2  F2_G 2005-06-18
  3  F2_G 2005-06-18
  4  F2_G 2005-06-18
  5  F2_G 2005-06-19
  6  F2_G 2005-06-19
  7  F2_G 2005-06-19
  8  F2_G 2005-06-19
  9  F2_G 2005-06-20

Upvotes: 3

Views: 5066

Answers (2)

Simon O&#39;Hanlon
Simon O&#39;Hanlon

Reputation: 60000

This should do what you want...

require(reshape2)
as.vector( apply( dcast( IN , ID ~ DATE , length )[,-1] , 1:2 , function(x)seq.int(x) ) )
 [1]  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6
 [27]  7  8  9 10  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6  7  8  9 10  1  2
 [53]  3  4  5  6  7  8  9 10

Bascially we use dcast to get the number of observations by ID and date like so

dcast( IN , ID ~ DATE , length )
  ID 2013-1-1 2013-1-2
1  1       10       10
2  2       10       10
3  3       10       10

Then we use apply across each cell to make a sequence of integers as long as the count of ID for each date. Finally we coerce back to a vector using as.vector.

Upvotes: 3

Arun
Arun

Reputation: 118879

Here's one using ave:

OUT <- within(IN, {N <- ave(ID, list(ID, DATE), FUN=seq_along)})

Upvotes: 6

Related Questions