add running counter for semi-consecutive strings in vector

Question

I would like to add a number indicating the x^th occurrence of a word in a vector. (So this question is different from Make a column with duplicated values unique in a dataframe , because I have a simple vector and try to avoid the overhead of casting it to a data.frame).

E.g. for the vector:

book, ship, umbrella, book, ship, ship

the output would be:

book, ship, umbrella, book2, ship2, ship3

I have solved this myself by transposing the vector to a dataframe and next using the grouping function. That feels like using a sledgehammer to crack nuts:

# add consecutive number for equal string
words <- c("book", "ship", "umbrella", "book", "ship", "ship")

# transpose word vector to data.frame for grouping
df <- data.frame(words = words)
df <- df %>% group_by(words) %>% mutate(seqN = row_number())

# combine columns and remove '1' for first occurrence
wordsVec <- paste0(df$words, df$seqN)       
gsub("1", "", wordsVec)
# [1] "book"     "ship"     "umbrella" "book2"    "ship2"    "ship3"

Is there a more clean solution, e.g. using the stringr package?

Sotos · Accepted Answer

You can still utilize row_number() from dplyr but you don't need to convert to data frame, i.e.

sub('1$', '', ave(words, words, FUN = function(i) paste0(i, row_number(i))))
#[1] "book"     "ship"     "umbrella" "book2"    "ship2"    "ship3"

Another option is to use make.unique along with gsubfn to increment your values by 1, i.e.

library(gsubfn)
gsubfn("\d+", function(x) as.numeric(x) + 1, make.unique(words))
#[1] "book"     "ship"     "umbrella" "book.2"   "ship.2"   "ship.3"

add running counter for semi-consecutive strings in vector

Answers (1)

Related Questions