Reputation: 797
I would like to extract three letters of each string for each row in df
as below
Exampe:
df <- data.frame(name = c('Jame Bond', "Maria Taylor", "Micheal Balack"))
df
name
1 Jame Bond
2 Maria Taylor
3 Micheal Balack
desired out
df_new
name
1 Jam_Bon
2 Mar_Tay
3 Mic_Bal
Any sugesstions for this using tidyverse?
Upvotes: 4
Views: 2109
Reputation: 887961
In base R
, we can use sub
- capture ((...)
) the first three non-space (\\S
) characters from the start (^
), followed by zero or more non-white space and a white space (\\S*\\s
), then capture the second set of 3 non-white characters. In the replacement, specify the backreferences (\\1
, \\2
) of the captured groups and insert underscore (_
) between those
df$name <- sub("^(\\S{3})\\S*\\s(\\S{3}).*", "\\1_\\2", df$name)
df$name
[1] "Jam_Bon" "Mar_Tay" "Mic_Bal"
Upvotes: 1
Reputation: 685
An alternative method using tidyr
functions:
df |>
extract(name, c("x1","x2"), "(\\w{3}).*\\s(\\w{3})") |>
unite(col = "name",x1,x2, sep = "_")
Giving:
name
1 Jam_Bon
2 Mar_Tay
3 Mic_Bal
Note that this assumes all first names and surnames have at least 3 characters, otherwise replace the extract regex with "(\\w{1,3}).*\\s(\\w{1,3})"
Upvotes: 2
Reputation: 76
library(stringr)
library(dplyr)
df$name %>%
str_extract_all("(?<=(^|[:space:]))[:alpha:]{3}") %>%
map_chr(~ str_c(.x, collapse = "_"))
The stringr
cheatsheet is very useful for working through these types of problems.
https://www.rstudio.com/resources/cheatsheets/
Created on 2022-03-26 by the reprex package (v2.0.1)
Upvotes: 6
Reputation: 76
You can try this with dplyr::rowwise()
, stringr::str_split()
and stringr::str_sub()
:
df_new <- df %>%
rowwise() %>%
mutate(name = paste(
unlist(
lapply(str_split(name, ' '), function(x){
str_sub(x, 1, 3)
})
),
collapse = "_"
))
I got the same result as you expected :
> df_new
# A tibble: 3 x 1
# Rowwise:
name
<chr>
1 Jam_Bon
2 Mar_Tay
3 Mic_Bal
Upvotes: 4