Anh
Anh

Reputation: 797

How to get first n characters from a string in R

I would like to extract three letters of each string for each row in df as below

Exampe:

df <- data.frame(name = c('Jame Bond', "Maria Taylor", "Micheal Balack"))
df
            name
1      Jame Bond
2   Maria Taylor
3 Micheal Balack

desired out

df_new 
        name
1      Jam_Bon
2      Mar_Tay
3      Mic_Bal

Any sugesstions for this using tidyverse?

Upvotes: 4

Views: 2109

Answers (4)

akrun
akrun

Reputation: 887961

In base R, we can use sub - capture ((...)) the first three non-space (\\S) characters from the start (^), followed by zero or more non-white space and a white space (\\S*\\s), then capture the second set of 3 non-white characters. In the replacement, specify the backreferences (\\1, \\2) of the captured groups and insert underscore (_) between those

df$name <- sub("^(\\S{3})\\S*\\s(\\S{3}).*", "\\1_\\2", df$name)
df$name
[1] "Jam_Bon" "Mar_Tay" "Mic_Bal"

Upvotes: 1

awaji98
awaji98

Reputation: 685

An alternative method using tidyr functions:

df |> 
  extract(name, c("x1","x2"), "(\\w{3}).*\\s(\\w{3})") |> 
  unite(col = "name",x1,x2, sep = "_")

Giving:

     name
1 Jam_Bon
2 Mar_Tay
3 Mic_Bal

Note that this assumes all first names and surnames have at least 3 characters, otherwise replace the extract regex with "(\\w{1,3}).*\\s(\\w{1,3})"

Upvotes: 2

acammack1234
acammack1234

Reputation: 76

library(stringr)
library(dplyr)

df$name %>% 
  str_extract_all("(?<=(^|[:space:]))[:alpha:]{3}") %>% 
  map_chr(~ str_c(.x, collapse = "_"))

The stringr cheatsheet is very useful for working through these types of problems. https://www.rstudio.com/resources/cheatsheets/

Created on 2022-03-26 by the reprex package (v2.0.1)

Upvotes: 6

Ao Sun
Ao Sun

Reputation: 76

You can try this with dplyr::rowwise(), stringr::str_split() and stringr::str_sub():

df_new <- df %>% 
  rowwise() %>% 
  mutate(name = paste(
    unlist(
      lapply(str_split(name, ' '), function(x){
        str_sub(x, 1, 3)
      })
    ), 
    collapse = "_"
  ))

I got the same result as you expected :

> df_new
# A tibble: 3 x 1
# Rowwise: 
  name   
  <chr>  
1 Jam_Bon
2 Mar_Tay
3 Mic_Bal

Upvotes: 4

Related Questions