Jhankar Mahbub
Jhankar Mahbub

Reputation: 9848

Match text between double or single quotes

I need some regex help. I have been banging my head on this last few hours. I need to match some strings in a minified file.

Sample string:

var a ='abc'; var b = 'http://a/that.dude.js/v1/'; var c = 'def'; var d = 'https://b/that.dude.js/v1/';
var basePath = "http://othersite/that.dude.js/v1/";

I want to match full text inside single or double quotes that contains that.dude.js/v1. I tried:

/('|").+that.dude.js\/v1\/('|")/g

...but this matches the full line when there are multiple occurrences in the same line.

My expected match will be:

http://a/that.dude.js/v1/
https://b/that.dude.js/v1/
http://othersite/that.dude.js/v1/

Here is what I have tried: http://regexr.com/3cv62

Upvotes: 3

Views: 1278

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626853

If you have single quotes inside double quoted strings, you need to capture the quote delimiter and use a backreference to match exactly the same trailing delimiter:

(['"])([^"'\s]*that\.dude\.js\/v1[^"'\s]*)\1

See the regex demo.

Since you have URLs, you can safely match them with [^"'\s]* (one or more symbols other than ", ' and whitespace). The regex matches:

  • (['"]) - the leading quote delimiter (captured into Group 1 so that we could match the same trailing delimiter)
  • ([^"'\s]*that\.dude\.js\/v1[^"'\s]*) - Group 2 matching
    • [^"'\s]* - 0+ symbols other than ", ' and whitespace
    • that\.dude\.js\/v1 - that.dude.js/v1
    • [^"'\s]* - ibid.
  • \1 - trailing delimiter that is the same as the leading one

The result will be in Group 2:

var re = /(['"])([^"'\s]*that\.dude\.js\/v1[^"'\s]*)\1/g; 
var str = 'var a =\'abc\'; var b = \'http://a/that.dude.js/v1/\'; var c = \'def\'; var d = \'https://b/that.dude.js/v1/\';\nvar basePath = "http://othersite/that.dude.js/v1/";';
var res = [];
 
while ((m = re.exec(str)) !== null) {
  res.push(m[2]);
}
document.body.innerHTML = "<pre>" + JSON.stringify(res, 0, 4) + "</pre>";

Note that to make it even more generic, you could use a tempered greedy token:

(['"])((?:(?!\1).)*that\.dude\.js\/v1(?:(?!\1).)*)\1
       ^^^^^^^^^^^^                  ^^^^^^^^^^^^  

See another demo

The (?:(?!\1).) token will match any character(s) but a newline that are not equal to the value referred to by the \1 backreference.

Upvotes: 2

Adam
Adam

Reputation: 5233

Try this one:

/(["'])[^"']+that\.dude\.js\/v1\/\1/g

The only modification was to change . to [^"'] this doesn't allow quotes between the quotes.

Upvotes: 2

Related Questions