extract the first two characters from a list of names in r

Question

The data frame df1 contains two columns: id and list_names

id <- seq(1,5)
list_names <- c("john", 
                "adam, sally", 
                "rebecca", 
                "zhang, mike, antonio", 
                "mark, henry, scott, john, steve, jason, nancy")

df1 <- data.frame(id, list_names)

I need to add an additional column that contains the first two characters extracted from every name.

The new data set would look like

Note that the number of names in each row does not need to be specified as it could be anything.

akrun · Accepted Answer

We can use str_extract_all to extract two characters after the word boundary

library(stringr)
library(dplyr)
library(purrr)
df1 %>%
     mutate(two_chars = str_extract_all(list_names, "\b[a-z]{2}")  %>%
                            map_chr(toString))
#  id                                    list_names                  two_chars
#1  1                                          john                         jo
#2  2                                   adam, sally                     ad, sa
#3  3                                       rebecca                         re
#4  4                          zhang, mike, antonio                 zh, mi, an
#5  5 mark, henry, scott, john, steve, jason, nancy ma, he, sc, jo, st, ja, na

Or using gsub

gsub("\b([a-z]{2})[^,]+", "\1", df1$list_names)
#[1] "jo"                         "ad, sa"                     "re"                         "zh, mi, an"                
#[5] "ma, he, sc, jo, st, ja, na"

extract the first two characters from a list of names in r

Answers (2)

Related Questions