Reputation: 1145
The data frame df1 contains two columns: id and list_names
id <- seq(1,5)
list_names <- c("john",
"adam, sally",
"rebecca",
"zhang, mike, antonio",
"mark, henry, scott, john, steve, jason, nancy")
df1 <- data.frame(id, list_names)
I need to add an additional column that contains the first two characters extracted from every name.
The new data set would look like
Note that the number of names in each row does not need to be specified as it could be anything.
Upvotes: 2
Views: 611
Reputation: 2640
In a for loop split each observation with strsplit()
on ', '
then substr
the first two characters, and then paste
back together:
for(g in df1$list_names){
print(
paste(substr(unlist(strsplit(g, ', ')),1,2), collapse = ', ')
)
}
[1] "jo"
[1] "ad, sa"
[1] "re"
[1] "zh, mi, an"
[1] "ma, he, sc, jo, st, ja, na"
or you can one line this with sapply
:
df1$new_list_names = sapply(df1$list_names, function(g) paste(substr(unlist(strsplit(as.character(g), ', ')),1,2), collapse = ', '))
> df1
id list_names new_list_names
1 1 john jo
2 2 adam, sally ad, sa
3 3 rebecca re
4 4 zhang, mike, antonio zh, mi, an
5 5 mark, henry, scott, john, steve, jason, nancy ma, he, sc, jo, st, ja, na
Upvotes: 2
Reputation: 887621
We can use str_extract_all
to extract two characters after the word boundary
library(stringr)
library(dplyr)
library(purrr)
df1 %>%
mutate(two_chars = str_extract_all(list_names, "\\b[a-z]{2}") %>%
map_chr(toString))
# id list_names two_chars
#1 1 john jo
#2 2 adam, sally ad, sa
#3 3 rebecca re
#4 4 zhang, mike, antonio zh, mi, an
#5 5 mark, henry, scott, john, steve, jason, nancy ma, he, sc, jo, st, ja, na
Or using gsub
gsub("\\b([a-z]{2})[^,]+", "\\1", df1$list_names)
#[1] "jo" "ad, sa" "re" "zh, mi, an"
#[5] "ma, he, sc, jo, st, ja, na"
Upvotes: 3