Gughan
Gughan

Reputation: 133

Running count based on field in R

I have a data set of this format

User       
1 
2
3
2
3
1  
1      

Now I want to add a column saying count which counts the occurrence of the user. I want output in the below format.

User    Count
1       1
2       1 
3       1
2       2
3       2
1       2
1       3

I have few solutions but all those solutions are somewhat slow.

Running count variable in R

My data.frame has 100,000 rows now and soon it may go up to 1 million. I need a solution which is also fast.

Upvotes: 8

Views: 6969

Answers (3)

akrun
akrun

Reputation: 887118

An option using dplyr

 library(dplyr)
 df1 %>%
      group_by(User) %>%
      mutate(Count=row_number())
 #    User Count
 #1    1     1
 #2    2     1
 #3    3     1
 #4    2     2
 #5    3     2
 #6    1     2
 #7    1     3

Using sqldf

library(sqldf)
sqldf('select a.*, 
           count(*) as Count
           from df1 a, df1 b
           where a.User = b.User and b.rowid <= a.rowid
           group by a.rowid')
#   User Count
#1    1     1
#2    2     1
#3    3     1
#4    2     2
#5    3     2
#6    1     2
#7    1     3

Upvotes: 12

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

You can use getanID from my "splitstackshape" package:

library(splitstackshape)
getanID(mydf, "User")
##    User .id
## 1:    1   1
## 2:    2   1
## 3:    3   1
## 4:    2   2
## 5:    3   2
## 6:    1   2
## 7:    1   3

This is essentially an approach with "data.table" that looks something like the following:

as.data.table(mydf)[, count := seq(.N), by = "User"][]

Upvotes: 5

IRTFM
IRTFM

Reputation: 263342

This is fairly easy with ave and seq.int:

> ave(User,User, FUN= seq.int)
[1] 1 1 1 2 2 2 3

This is a common strategy and is often used when the items are adjacent to each other. The second argument is the grouping variable and in this case the first argument is really kind of a dummy argument since the only thing that it contributes is a length, and it is not a requirement for ave to have adjacent rows for the values determined within groupings.

Upvotes: 6

Related Questions