puginablanket
puginablanket

Reputation: 187

Note and recode duplicates

I have a dataframe that's similar to what's below:

num <- c(1, 2, 3, 4)
name <- c("A", "B", "C", "A")
df <- cbind(num, name)

I'm looking to essentially turn this into:

num <- c(1, 2, 3, 4)
name <- c("A1", "B", "C", "A2")
df <- cbind(num, name)

How would I do this automatically, since my actual data is much larger?

Upvotes: 0

Views: 61

Answers (4)

Ben Bolker
Ben Bolker

Reputation: 226871

It might be worth considering the built-in make.unique(), although it doesn't do exactly what the OP wants (it doesn't label the first duplicated value, so that it can be run multiple times in succession). A little bit of extra trickiness is also required since name is a factor:

df <- data.frame(num = c(1, 2, 3, 4),
                 name = c("A", "B", "C", "A"))
df <- transform(df, name=factor(make.unique(
                          as.character(name),sep="")))
##   num name
## 1   1    A
## 2   2    B
## 3   3    C
## 4   4   A1

Upvotes: 1

ebyerly
ebyerly

Reputation: 672

Puginablanket,

See below for two solutions, one using the plyr package and the other using base R's by and do.call functions.

eg <- data.frame(num = c(1, 2, 3, 4, 5),
                 name = c("A", "B", "C", "A", "B"),
                 stringsAsFactors = FALSE)

do.call(rbind, by(eg, eg$name, function(x) {
  x$name2 <- paste0(x$name, 1:nrow(x))
  x
}))

plyr::ddply(eg, "name", function(x) {
  x$name2 <- paste0(x$name, 1:nrow(x))
  x
})

Depending on your application, it might make sense to create a separate column which tracks this duplication (so that you're not using string parsing at a later step to pull it back apart).

Upvotes: 1

bgoldst
bgoldst

Reputation: 35324

Here's a one-line solution, assuming you really do have a data.frame rather than a matrix (a matrix is what is returned by your cbind() command):

df <- data.frame(num=1:4, name=c('A','B','C','A') );
transform(df,name=paste0(name,ave(c(name),name,FUN=function(x) if (length(x) > 1) seq_along(x) else '')));
##   num name
## 1   1   A1
## 2   2    B
## 3   3    C
## 4   4   A2

Upvotes: 0

user2957945
user2957945

Reputation: 2413

I converted your matrix to a dataframe

df <- data.frame(num, name)

#Get duplicat names
ext <- as.numeric(ave(as.character(df$name) , df$name, 
                                   FUN=function(x) cumsum(duplicated(x))+1))

nms <- df$name[ext > 1]

#add into data   
df$newname <- ifelse( df$name %in% nms, paste0(df$name, ext), as.character(df$name))

Upvotes: 0

Related Questions