Reputation: 1437
I am using R and I want to create a column showing a sequence or rank, while grouping by two factors (hhid and period).
For example, I have this data set:
hhid perid
1000 1
1000 1
1000 1
1000 2
1000 2
2000 1
2000 1
2000 1
2000 1
2000 2
2000 2
I want to add a column called "actno" like this:
hhid perid actno
1000 1 1
1000 1 2
1000 1 3
1000 2 1
1000 2 2
2000 1 1
2000 1 2
2000 1 3
2000 1 4
2000 2 1
2000 2 2
Upvotes: 2
Views: 1877
Reputation: 26612
Pseudocode:
For each unique value of `hhid` `h`
For each unique value of `perid` `p`
counter = 0;
For each row of table where `hhid==h && perid==p`
counter++;
Assign counter to `actno` of this column
Should be trivial to implement, especially with a data frame.
Upvotes: -4
Reputation: 115390
If you have lots of groups or large data, data.table
is the way to go for efficiency of time and memory
# assuming your data is in a data.frame called DF
library(data.table)
DT <- data.table(DF)
DT[, ActNo := seq_len(.N), by = list(hhid,perid)]
note that .N
gives the number of rows in the subset by grouping (see ?data.table
for more details)
Upvotes: 4
Reputation: 263352
No need for plyr. Just use ave
and seq
:
> dat$actno <- with( dat, ave(hhid, hhid, perid, FUN=seq))
> dat
hhid perid actno
1 1000 1 1
2 1000 1 2
3 1000 1 3
4 1000 2 1
5 1000 2 2
6 2000 1 1
7 2000 1 2
8 2000 1 3
9 2000 1 4
10 2000 2 1
11 2000 2 2
The first argument in this instance could be either column or you could do it with the slightly less elegant bu perhaps more clear:
dat$actno <- with( dat, ave(hhid, hhid, perid, FUN=function(x) seq(length(x) ) ) )
Upvotes: 3
Reputation: 15441
if your data is called urdat
then without plyr
you can do:
df <- urdat[order(urdat$hhid, urdat$perid),]
df$actno <- sequence(rle(df$perid)$lengths)
Upvotes: 2
Reputation: 43255
the plyr
package can do this nicely:
library(plyr)
dat <- structure(list(hhid = c(1000L, 1000L, 1000L, 1000L, 1000L, 2000L,
2000L, 2000L, 2000L, 2000L, 2000L), perid = c(1L, 1L, 1L, 2L,
2L, 1L, 1L, 1L, 1L, 2L, 2L)), .Names = c("hhid", "perid"), class = "data.frame", row.names = c(NA,
-11L))
ddply(dat, .(hhid, perid), transform, actno=seq_along(perid))
hhid perid actno
1 1000 1 1
2 1000 1 2
3 1000 1 3
4 1000 2 1
5 1000 2 2
6 2000 1 1
7 2000 1 2
8 2000 1 3
9 2000 1 4
10 2000 2 1
11 2000 2 2
Upvotes: 1