Reputation: 5456
I need help regarding a regular expression that extracts the third element separated by an underscore. The number of underscores is variable. I can do it using str_split, but is there a way to get the same result as below using str_replace?
(The desired result is x = AAAA, BBBB, CCCC, DDDD
. If possible maintaining the grouping using ()
.)
library(tidyverse)
library(stringr)
d <- enframe(c("asfe_01_AAAA_fses_feee",
"asfe_87_BBBB_fses_feee",
"99_fesf_CCCC_feee",
"99_fesf_DDDD"),
name = NULL, value = "txt")
d %>%
mutate(x = str_replace(txt, "(.+)_(.+)_(.+)_*(.*)_*(.*)", "\\3"),
want_strsplit = str_split(txt, "_", simplify = TRUE)[, 3])
#txt x want_strsplit
# <chr> <chr> <chr>
#1 asfe_01_AAAA_fses_feee feee AAAA
#2 asfe_87_BBBB_fses_feee feee BBBB
#3 99_fesf_CCCC_feee feee CCCC
#4 99_fesf_DDDD DDDD DDDD
Upvotes: 1
Views: 192
Reputation: 33498
d %>%
mutate(x = str_replace(txt, "^([^_]+)_([^_]+)_([^_]+).*", "\\3"))
[^_]
standing for anything except _
Upvotes: 2
Reputation: 72653
You could just exploit strsplit
a little bit more.
mapply(`[`, strsplit(d$txt, "_"), 3)
# [1] "AAAA" "BBBB" "CCCC" "DDDD"
For the whole thing:
splt <- strsplit(d$txt, "_")
cbind(d, x=mapply(`[`, splt, lengths(splt)), want_strsplit=mapply(`[`, splt, 3))
# txt x want_strsplit
# 1 asfe_01_AAAA_fses_feee feee AAAA
# 2 asfe_87_BBBB_fses_feee feee BBBB
# 3 99_fesf_CCCC_feee feee CCCC
# 4 99_fesf_DDDD DDDD DDDD
Upvotes: 4
Reputation: 886978
An option with sub
sub("^(([^_]+_){2})([^_]+).*", "\\3", d$txt)
#[1] "AAAA" "BBBB" "CCCC" "DDDD"
Upvotes: 2
Reputation: 2467
With str_replace
> d%>%mutate(x=str_replace(txt,"^((?:[^_]*_){2})([a-zA-Z]+).*","\\2"))
# A tibble: 4 x 2
txt x
<chr> <chr>
1 asfe_01_AAAA_fses_feee AAAA
2 asfe_87_BBBB_fses_feee BBBB
3 99_fesf_CCCC_feee CCCC
4 99_fesf_DDDD DDDD
The first group captures the first two occurrences of _
. The second groups captures any text after the last group.
In case you can also have numbers, you can generalize it with [[:alnum:]]
d%>%mutate(x=str_replace(txt,"^((?:[^_]*_){2})([[:alnum:]]+).*","\\2"))
Upvotes: 3