Reputation: 1091
I am trying to only retain the string after the first section of characters (which includes - and numerics) but before the forward slash.
I have the following string:
x <- c('/youtube.com/videos/cats', '/google.com/images/dogs', 'bbc.com/movies')
/youtube.com/videos/cats
/google.com/images/dogs
bbc.com/movies
So it would look like this
/youtube.com/
/google.com/
bbc.com/
For reference I am using R 3.6
I have tried positive lookbehinds and the closest I got was this: ^\/[^\/]*
Any help appreciated
So in the bbc.com/movies
example - the string does not start with a forward slash / but I still want to be able to keep the bbc.com
part during the match
Upvotes: 0
Views: 219
Reputation: 627110
You can use a sub
here to only perform a single regex replacement:
sub('^(/?[^/]*/).*', '\\1', x)
See the regex demo.
Details
^
- start of string
-(/?[^/]*/)
- Capturing group 1 (\1
in the replacement pattern): an optional /
, then 0 or more chars other than /
and then a /
.*
- any zero or more chars, as many as possible.See an R test online:
test <- c("/youtube.com/videos/cats", "/google.com/images/dogs", "bbc.com/movies")
sub('^(/?[^/]*/).*', '\\1', test)
# => [1] "/youtube.com/" "/google.com/" "bbc.com/"
Upvotes: 1
Reputation: 11596
Using base R
gsub('(\\/?.*\\.com\\/).*', '\\1', x)
[1] "/youtube.com/" "/google.com/" "bbc.com/"
Upvotes: 0
Reputation: 4344
an alternative would be with the rebus Package:
library(rebus)
library(stringi)
t <- c("/youtube.com/videos/cats"," /google.com/images/dogs"," bbc.com/movie")
pattern <- zero_or_more("/") %R% one_or_more(ALPHA) %R% DOT %R% one_or_more(ALPHA) %R% zero_or_more("/")
stringi::stri_extract_first_regex(t, pattern)
[1] "/youtube.com/" "/google.com/" "bbc.com/"
Upvotes: -1
Reputation: 719
First great username. Try this, you can leverage the fact str_extract only pulls the first match out. assuming all urls match letters.letters this pattern should work. Let me know if you have numbers in any of them.
library(stringr)
c("/youtube.com/videos/cats",
"/google.com/images/dogs",
"bbc.com/movies") %>%
str_extract(., "/?\\w+\\.\\w+/")
produces
"/youtube.com/" "/google.com/" "bbc.com/"
Upvotes: 1