RoyValentine
RoyValentine

Reputation: 143

Regular Expression find usage of word after "/" in URL

I am trying to parse through URLs using Ruby and return the URLs that match a word after the "/" in .com , .org , etc.

If I am trying to capture "questions" in a URL such as https://stackoverflow.com/questions I also want to be able to capture https://stackoverflow.com/blah/questions. But I do not want to capture https://stackoverflow.com/queStioNs.

Currently my expression can match https://stackoverflow.com/questions but cannot match with "questions" after another "/", or 2 "/"s, etc.

The end of my regular expression is using \bquestions\.

I tried doing ([a-zA-Z]+\W{1}+\bjob\b|\bjob\b) but this only gets me URLs with /questions and /blah/questions but not /blah/bleh/questions.

What am I doing wrong and how do I match what I need?

Upvotes: 0

Views: 282

Answers (2)

Rajesh Omanakuttan
Rajesh Omanakuttan

Reputation: 6918

I don't know whether there is any simple way around, here is my solution:

regexp = '^(https|http)?:\/\/[\w]+\.(com|org|edu)(\/{1}[a-z]+)*$'
group_length = "https://stackoverflow.com/blah/questions".match(regexp).length
"https://stackoverflow.com/blah/questions".match(regexp)[group_length - 1].gsub("/","")

It will return 'questions'.

Update as per you comments below:

use [\S]*(\/questions){1}$

Hope it helps :)

Upvotes: 0

khagler
khagler

Reputation: 4056

You don't actually need a regex for this, you can instead use the URI module:

require 'uri'

urls = ['https://stackoverflow.com/blah/questions', 'https://stackoverflow.com/queStioNs']

urls.each do |url|
    the_path = URI(url).path
    puts the_path if the_path.include?'questions' 
end

Upvotes: 4

Related Questions