Reputation: 625
I want to count the number of duplicates in a column and add them in another column to the data base.
For example, some data
a <- c(1,1,2,3,4,4)
b <- c("A","A","C","C","D","D")
df <- data.frame(a,b)
This is the result I am looking for:
a b count
1 1 A 1
2 1 A 2
3 2 C 1
4 3 C 1
5 4 D 1
6 4 D 2
Upvotes: 1
Views: 180
Reputation: 1313
I had a similar problem, but only needed to count duplicates based on the information in 1 column. user7298145's answer worked well for a small data frame, but my data has ~20k lines and failed with the error:
Error: memory exhausted (limit reached?)
Error during wrapup: memory exhausted (limit reached?)
So I created a for
loop that did the trick:
## order the values that are duplicated
primary_duplicated <- primary_duplicated1[order(primary_duplicated1$md5), ]
## create blank/NA column
primary_duplicated$count <- NA
## set first value as 1
primary_duplicated$count[1] <- 1
## set count of duplicates to 1 greater than the
## value of the preceding duplicate
for (i in 2:nrow(primary_duplicated)) {
if (primary_duplicated$md5[i] == primary_duplicated$md5[i-1]) {
primary_duplicated$count[i] <- primary_duplicated$count[i-1] + 1
} else {
## set the count value for the first incidence of
## a duplicate as 1
primary_duplicated$count[i] <- 1
}
}
Upvotes: 1
Reputation: 886938
We can do this with data.table
library(data.table)
setDT(df)[, count := seq_len(.N), .(a, b)]
df
# a b count
#1: 1 A 1
#2: 1 A 2
#3: 2 C 1
#4: 3 C 1
#5: 4 D 1
#6: 4 D 2
Upvotes: 1