Rename duplicate strings (or values) under a column in a data frame in R

Question

My question is a variation of Question asked here. I have a data frame with duplicate (repeating) values in Column2 something as follows:

df <- read.table(text='Column1   Column2  
 1         A 
 2         B 
 3         C 
 4         B 
 5         B 
 6         A
 7         C
 8         D ', header=TRUE)

Duplicate values do not follow any sequence. I want to rename duplicate column values so as to distinguish among them. Any variation will be OK. But all those values that are unique (as 'D' above is) should remain as they are. For example transformed column values can be as:

   Column1   Column2
     1         A1 
     2         B2 
     3         C1 
     4         B3 
     5         B4 
     6         A2
     7         C2
     8         D

Or can also be as:

 Column1   Column2
     1         Ax 
     2         Bx 
     3         Cx 
     4         By 
     5         Bz 
     6         Ay
     7         Cy
     8         D

where x, y and z are any digits or literals (even A.x or A_x are OK).

I have tried the following solution but while it does rename duplicate values, for unique column values it leaves numbers.

n<-transform(df, Column.new = ifelse(duplicated(Column2) | duplicated(Column2, fromLast=TRUE),paste(Column2,seq_along(Column2), sep="") , Column2))

The result is:

    Column1   Column2  Column.new
1       1       A          A1
2       2       B          B2
3       3       C          C3
4       4       B          B4
5       5       B          B5
6       6       A          A6
7       7       C          C7
8       8       D          4

Value 'D' (last row) should have remained as it is instead of getting substituted by '4' in 'Column.new'.

I shall be grateful for a solution.

jeremycg · Accepted Answer

using dplyr:

library(dplyr)

df %>% group_by(Column2) %>%
       mutate(new2 = if(n( ) > 1) {paste0(Column2, row_number( ))} 
                             else {paste0(Column2)})

Rename duplicate strings (or values) under a column in a data frame in R

Answers (1)

Related Questions