Reputation: 37
I have a data frame with numerical and character columns in which some rows are duplicates. To discriminate those rows I want to add to each "block" of duplicate rows a sequence number from 1:n as a new column (called "duplicateID" in my example).
My Dataset looks like this:
a = c("one", "one", "one", "one", "two", "two", "three", "four", "four", "four")
b = c(3.5, 3.5, 3.5, 2.5, 3.5, 3.5, 1, 2.2, 7, 7)
df1 <-data.frame(a,b)
>df1
a b
1 one 3.5
2 one 3.5
3 one 3.5
4 one 2.5
5 two 3.5
6 two 3.5
7 three 1.0
8 four 2.2
9 four 7.0
10 four 7.0
Desired output is:
a = c("one", "one", "one", "one", "two", "two", "three", "four", "four", "four")
b = c(3.5, 3.5, 3.5, 2.5, 3.5, 3.5, 1, 2.2, 7, 7)
duplicateID = c(1, 2, 3, 1, 1, 2, 1, 1, 1, 2)
df2 <-data.frame(a,b,duplicateID)
>df2
a b duplicateID
1 one 3.5 1
2 one 3.5 2
3 one 3.5 3
4 one 2.5 1
5 two 3.5 1
6 two 3.5 2
7 three 1.0 1
8 four 2.2 1
9 four 7.0 1
10 four 7.0 2
Thank you all in advance!
Upvotes: 1
Views: 2344
Reputation: 887251
We could use rowid
library(data.table)
setDT(df1)[, dupID := rowid(a, b)]
-output
> df1
a b dupID
1: one 3.5 1
2: one 3.5 2
3: one 3.5 3
4: one 2.5 1
5: two 3.5 1
6: two 3.5 2
7: three 1.0 1
8: four 2.2 1
9: four 7.0 1
10: four 7.0 2
Upvotes: 4
Reputation: 4344
One way to achive this with dplyr
:
library(dplyr)
df1 %>%
# build grouping by combination of variables
dplyr::group_by(a, b) %>%
# add row number which works per group due to prior grouping
dplyr::mutate(duplicateID = dplyr::row_number()) %>%
# ungroup to prevent unexpected behaviour down stream
dplyr::ungroup()
# A tibble: 10 x 3
a b duplicateID
<chr> <dbl> <int>
1 one 3.5 1
2 one 3.5 2
3 one 3.5 3
4 one 2.5 1
5 two 3.5 1
6 two 3.5 2
7 three 1 1
8 four 2.2 1
9 four 7 1
10 four 7 2
Upvotes: 5
Reputation: 362
Might not be as fast as dplyr (sure data.table has options too) but in base R you can achieve this with the "ave" function with "seq_along":
a = c("one", "one", "one", "one", "two", "two", "three", "four", "four", "four")
b = c(3.5, 3.5, 3.5, 2.5, 3.5, 3.5, 1, 2.2, 7, 7)
df1 <-data.frame(a,b)
df1$dupID = NA
df1$dupID = with(df1,ave(dupID,b,a,FUN = seq_along))
Upvotes: 2