Abhishek Gupta
Abhishek Gupta

Reputation: 77

Numbering rows from 1 to n within groups in a data frame

For a dataframe like this:

   cat        val  
1  aaa 0.05638315  
2  aaa 0.25767250  
3  aaa 0.30776611  
4  aaa 0.46854928  
5  aaa 0.55232243  
6  bbb 0.17026205  
7  bbb 0.37032054  
8  bbb 0.48377074  
9  bbb 0.54655860  
10 bbb 0.81240262  
11 ccc 0.28035384  
12 ccc 0.39848790  
13 ccc 0.62499648  
14 ccc 0.76255108  
15 ccc 0.88216552 

I want a assign repeating sequence numbers to rows group wise like I am assigning number only from 1 to 3 and then the sequence starts from 1 again in the same group:

   cat        val num  
1  aaa 0.05638315   1  
2  aaa 0.25767250   2  
3  aaa 0.30776611   3  
4  aaa 0.46854928   1  
5  aaa 0.55232243   2  
6  bbb 0.17026205   1  
7  bbb 0.37032054   2  
8  bbb 0.48377074   3  
9  bbb 0.54655860   1  
10 bbb 0.81240262   2  
11 ccc 0.28035384   1  
12 ccc 0.39848790   2  
13 ccc 0.62499648   3  
14 ccc 0.76255108   1  
15 ccc 0.88216552   2

How can I achieve it?

Upvotes: 0

Views: 750

Answers (3)

talat
talat

Reputation: 70256

Here's a classic split / apply / combine approach:

df <- unsplit(lapply(split(df, df$cat), function(x) 
              cbind(x, id = rep(1:3, length.out = nrow(x)))), df$cat)

#    cat        val id
# 1  aaa 0.05638315  1
# 2  aaa 0.25767250  2
# 3  aaa 0.30776611  3
# 4  aaa 0.46854928  1
# 5  aaa 0.55232243  2
# 6  bbb 0.17026205  1
# 7  bbb 0.37032054  2
# 8  bbb 0.48377074  3
# 9  bbb 0.54655860  1
# 10 bbb 0.81240262  2
# 11 ccc 0.28035384  1
# 12 ccc 0.39848790  2
# 13 ccc 0.62499648  3
# 14 ccc 0.76255108  1
# 15 ccc 0.88216552  2

And a dplyr alternative:

library(dplyr)
df %>% group_by(cat) %>% mutate(id = rep(1:3, length.out = n()))

And a data.table alternative, too:

library(data.table)
setDT(df)
df[, id := rep(1:3, length.out = .N), by = cat]

Upvotes: 2

Eric Lecoutre
Eric Lecoutre

Reputation: 1481

Here is a solution. Though there is a warning, I find it elegant as concise:

df=data.frame(cat=rep(letters[1:3],each=5),val=rnorm(3*5))
df[,"n"] <- tapply(df[,"val"],df[,"cat"],function(vec) rep.int(1:3,times=ceiling(length(vec)/3))[1:length(vec)])
df

with result

> df
   cat         val n
1    a -0.01160222 1
2    a  0.13296221 2
3    a -0.19907366 3
4    a -0.52969178 1
5    a  0.05834779 2
6    b  1.06572206 1
7    b  1.23418529 2
8    b -2.53532404 3
9    b -0.77518265 1
10   b -1.35705148 2
11   c -1.16828739 1
12   c -0.32130593 2
13   c  0.98217935 3
14   c  0.31917671 1
15   c  0.89867657 2

Upvotes: 0

Damiano Fantini
Damiano Fantini

Reputation: 1975

This should do the trick. You can get the unique cats in your data.frame, extract the corresponding rows and then attach a numeric vector of integers starting from 1, including values in the sequence (1,2,3). This is recounted for 1 for each cat.

df <- data.frame(cat=c(rep("aaa", 5), rep("bbb", 2), rep("ccc", 4), rep("ddd", 7)), 
                 val = rnorm(n = 18))

df$num <- do.call(c, lapply(unique(df$cat), (function(i){
  slice <- df[df$cat==i,]
  rep(1:3, 1+as.integer(nrow(slice)/3))[1:nrow(slice)]
})))

The final result is the following

   cat         val num
1  aaa -0.20791826   1
2  aaa  1.95733315   2
3  aaa  1.01099852   3
4  aaa  0.25355751   1
5  aaa  0.70946906   2
6  bbb  1.60555603   1
7  bbb -0.05718921   2
8  ccc  0.13465897   1

Upvotes: 0

Related Questions