Rumpl
Rumpl

Reputation: 343

numbering characters in a string

I want to number the letters in a large dataset. Some letters occur multiple times and are numbered ("A1", "A2"), others also occur multiple times but are not numbered. There are also letters that occur only once... but maybe it's easier to look at the example data below.

The numbers in df$nr are the desired result. How can I get df$nr from df$word and df$letter ?

df <-tibble(word=c(rep("Amamam", 17), rep("Bobob", 14)),
            letter=c("A1", "A1", "A1", "A1", "A2", "A2", "m", "m", "m", "a", "a", "m", "m", "a", "a", "m", "m",
                     "B1", "B1", "B2", "B2", "B3", "B3", "o", "b", "b", "b", "o", "o", "o", "b"),
            nr=c(1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6,
                 1, 1, 1, 1, 1, 1, 2, 3, 3, 3, 4, 4, 4, 5) )

Upvotes: 6

Views: 62

Answers (1)

akrun
akrun

Reputation: 887481

We can group by 'word', remove the numeric part from the 'letter' column, convert to run-length-id (rleid from data.table)

library(dplyr)
library(stringr)
library(data.table)
df1 <- df %>% 
        group_by(word) %>%
        mutate(nr1 = rleid(str_remove(letter, "\\d+")))

all.equal(df1$nr, df1$nr1)
#[1] TRUE

Upvotes: 3

Related Questions