Reputation: 625

tally number of duplicates and put them in a column in a data frame

I want to count the number of duplicates in a column and add them in another column to the data base.

For example, some data

a <- c(1,1,2,3,4,4)
b <- c("A","A","C","C","D","D")

df <- data.frame(a,b)

This is the result I am looking for:

  a b count
1 1 A     1
2 1 A     2
3 2 C     1
4 3 C     1
5 4 D     1
6 4 D     2

Upvotes: 1

Answers (4)

Josh

Reputation: 1337

I had a similar problem, but only needed to count duplicates based on the information in 1 column. user7298145's answer worked well for a small data frame, but my data has ~20k lines and failed with the error:

Error: memory exhausted (limit reached?)
Error during wrapup: memory exhausted (limit reached?)

So I created a for loop that did the trick:

##  order the values that are duplicated
primary_duplicated <- primary_duplicated1[order(primary_duplicated1$md5), ]
##  create blank/NA column
primary_duplicated$count <- NA
##  set first value as 1
primary_duplicated$count[1] <- 1
##  set count of duplicates to 1 greater than the 
##  value of the preceding duplicate
for (i in 2:nrow(primary_duplicated)) {
      if (primary_duplicated$md5[i] == primary_duplicated$md5[i-1]) {
            primary_duplicated$count[i] <- primary_duplicated$count[i-1] + 1
      } else {
      ##  set the count value for the first incidence of
      ##  a duplicate as 1
            primary_duplicated$count[i] <- 1
      }
}

Upvotes: 1

akrun

Reputation: 887881

We can do this with data.table

library(data.table)
setDT(df)[, count := seq_len(.N), .(a, b)]
df
#    a b count
#1: 1 A     1
#2: 1 A     2
#3: 2 C     1
#4: 3 C     1
#5: 4 D     1
#6: 4 D     2

Upvotes: 1

user7298145

Reputation:

df$count <- with(df, ave(rep(1, nrow(df)), b, a, FUN = cumsum))

Upvotes: 2

AK88

Reputation: 3026

Try this:

df$count = sequence(rle(df$a)$lengths)
df

Upvotes: 1

tally number of duplicates and put them in a column in a data frame

Answers (4)

Related Questions