r.user.05apr
r.user.05apr

Reputation: 5456

Split string at separator using stringr::str_replace

I need help regarding a regular expression that extracts the third element separated by an underscore. The number of underscores is variable. I can do it using str_split, but is there a way to get the same result as below using str_replace? (The desired result is x = AAAA, BBBB, CCCC, DDDD. If possible maintaining the grouping using ().)

library(tidyverse)
library(stringr)

d <- enframe(c("asfe_01_AAAA_fses_feee",
               "asfe_87_BBBB_fses_feee",
               "99_fesf_CCCC_feee",
               "99_fesf_DDDD"),
             name = NULL, value = "txt")

d %>%
  mutate(x = str_replace(txt, "(.+)_(.+)_(.+)_*(.*)_*(.*)", "\\3"),
         want_strsplit = str_split(txt, "_", simplify = TRUE)[, 3])

#txt                    x     want_strsplit
#  <chr>                  <chr> <chr>        
#1 asfe_01_AAAA_fses_feee feee  AAAA         
#2 asfe_87_BBBB_fses_feee feee  BBBB         
#3 99_fesf_CCCC_feee      feee  CCCC         
#4 99_fesf_DDDD           DDDD  DDDD    

Upvotes: 1

Views: 192

Answers (4)

s_baldur
s_baldur

Reputation: 33498

d %>%
  mutate(x = str_replace(txt, "^([^_]+)_([^_]+)_([^_]+).*", "\\3"))
  • [^_] standing for anything except _

Upvotes: 2

jay.sf
jay.sf

Reputation: 72653

You could just exploit strsplit a little bit more.

mapply(`[`, strsplit(d$txt, "_"), 3)
# [1] "AAAA" "BBBB" "CCCC" "DDDD"

For the whole thing:

splt <- strsplit(d$txt, "_")
cbind(d, x=mapply(`[`, splt, lengths(splt)), want_strsplit=mapply(`[`, splt, 3))
#                      txt    x want_strsplit
# 1 asfe_01_AAAA_fses_feee feee          AAAA
# 2 asfe_87_BBBB_fses_feee feee          BBBB
# 3      99_fesf_CCCC_feee feee          CCCC
# 4           99_fesf_DDDD DDDD          DDDD

Upvotes: 4

akrun
akrun

Reputation: 886978

An option with sub

sub("^(([^_]+_){2})([^_]+).*", "\\3", d$txt)
#[1] "AAAA" "BBBB" "CCCC" "DDDD"

Upvotes: 2

boski
boski

Reputation: 2467

With str_replace

> d%>%mutate(x=str_replace(txt,"^((?:[^_]*_){2})([a-zA-Z]+).*","\\2"))
# A tibble: 4 x 2
  txt                    x    
  <chr>                  <chr>
1 asfe_01_AAAA_fses_feee AAAA 
2 asfe_87_BBBB_fses_feee BBBB 
3 99_fesf_CCCC_feee      CCCC 
4 99_fesf_DDDD           DDDD 

The first group captures the first two occurrences of _. The second groups captures any text after the last group.
In case you can also have numbers, you can generalize it with [[:alnum:]]

d%>%mutate(x=str_replace(txt,"^((?:[^_]*_){2})([[:alnum:]]+).*","\\2"))

Upvotes: 3

Related Questions