Reputation: 1335
Basically what the title says, I have a vector of character strings and for each element I want to extract everything between the first and third period. E.g.
s <- c("random.0.0.word.1.0", "different.0.02.words.15.6", "different.0.1.words.4.2")
The result should be:
"0.0" "0.02" "0.1"
I have tried adapting code from here and here but failed. Any advice much appreciated!
Upvotes: 2
Views: 501
Reputation: 47340
Here's a way with unglue, which some might find less intimidating :
library(unglue)
s <- c("random.0.0.word.1.0", "different.0.02.words.15.6", "different.0.1.words.4.2")
unglue_vec(s, "{=[^.]+}.{x}.{=[^.]+}.{=[^.]+}.{=[^.]+}")
#> [1] "0.0" "0.02" "0.1"
Created on 2020-01-16 by the reprex package (v0.3.0)
The subpatterns [^.]+
are sequences of "non dots", not named (nothing on the lhs of =
) because we don't want to extract them.
Upvotes: 1
Reputation: 389175
We can use sub
to capture as little as possible between 1st and 3rd period.
sub(".*?\\.(.*?\\..*?)\\..*", "\\1", s)
#[1] "0.0" "0.02" "0.1"
Upvotes: 1
Reputation: 887571
We can capture as a group by matching characters not a .
([^.]+
) from the start (^
) of the string, followed by a dot (\\.
) and then capture all the characters between the first and the third dot, in the replacement use the backreference (\\1
) of the captured group ((...)
)
sub("^[^.]+\\.([^.]+\\.[^.]+)\\..*", "\\1", s)
#[1] "0.0" "0.02" "0.1"
Or it can be also done with substr
after getting the position of the dots
lst1 <- gregexpr('.', s, fixed = TRUE)
substring(s, sapply(lst1, `[`, 1) + 1, sapply(lst1, `[`, 3) - 1)
#[1] "0.0" "0.02" "0.1"
Upvotes: 2
Reputation: 60160
An alternative way to do this, without using any fancy regex features, is just to split on .
and then grab the bits we need:
library(stringr)
library(purrr)
str_split(s, "\\.") %>%
map_chr(~ paste0(.[2:3], collapse = "."))
Upvotes: 1