Subsetting a value based on partial pattern

Question

I'm trying to subset out using regular expressions, the url: happy_to-learn.com.

As I'm really new to regex, could someone help with my code as to why it does not work?

x <- c("happy_to-learn.com", "His_is-omitted.net")
str_subset(x, "^[a-zA-Z](\_|\-)*\.com$")

I understand that ^[a-zA-Z](\_|\-)* this portion here refers to, "Start when you hit a range of alphabets from a to z or A to Z, and it contains either _ or -, if yes, then subset out this portion with 0 or more matches.

However, is it possible continue from this code by adding the back part of the value i wish to subset? i.e. \.com$ refers to all values that end with .com.

Is there something like "^[a-zA-Z](\_|\-)*...\.com$" in regex?

akrun · Accepted Answer

We need to specify one or more with + as the _ or - are not just after the first letter.

str_subset(x, "^[a-zA-Z]+(\_|\-).*\.com$")
#[1] "happy_to-learn.com"

Also, the .* refers to zero or more characters as . can be any character until the . and 'com' at the end ($) of the string

Subsetting a value based on partial pattern

Answers (2)

Related Questions