mitchelllc
mitchelllc

Reputation: 1657

regex matching urls that contain string in relative path not in domain

This is one of my interview questions. I didn't come up with a good enough solution and got rejected.

The question was

What is the one regex to match all urls that contain job(case insensitive) in the relative   
path(not domain) in the following list:

    - http://www.glassdoor.com/job/ABC
    - https://glassdoor.com/job/
    - HTTPs://job.com/test
    - Www.glassdoor.com/foo/bar/joBs
    - http://192.168.1.1/ABC/job
    - http://bankers.jobs/ABC/job

My solution was using lookahead and lookbehind, /(?<!\.)job(?!\.)/i. This works fine in above lists. However, if the url is HTTPs://jobs.com/test, it will not work.

I am wondering what is the correct answer for this question. Thanks in advance for any suggestions!

Upvotes: 4

Views: 20831

Answers (4)

anubhava
anubhava

Reputation: 786349

Try this regex:

\b(?:https?:\/\/)?[^\/:\n]+\/.*?job

Online RegEx Demo

RegEx Details:

  • \b: Word boundary
  • (?:https?:\/\/)?: Match optional http:// or https://
  • [^\/:]+: Match 1+ of any characters that are not / and :
  • \/: Match a /
  • .*?job: Match 0 or more characters followed by text job

Upvotes: 1

Zzz...
Zzz...

Reputation: 291

i was also asked this question during the interview and here is my solution: /./+job/?./i it works well on Rubular.com

Upvotes: 0

user557597
user557597

Reputation:

If you don't need to validate the url, just focus on 'job'

 #  /(?i)(?<=\/)job(?=\/|[^\S\r\n]*$)/

 (?i)
 (?<= / )
 job
 (?= / | [^\S\r\n]* $ )

Upvotes: 2

Jess
Jess

Reputation: 25157

Here is one that I came up with:

^(?:.*://)?(?:[wW]{3}\.)?([^:/])*/.*job.*

It matches all of your examples, but not the ones with job.com or jobs.com. (jobs is only in the path.)

I tested this in sublime text which is nice b/c the regex result is highlighted as you type.

Upvotes: 1

Related Questions