Tom Gullen
Tom Gullen

Reputation: 61737

Javascript Regexp loop all matches

I'm trying to do something similar with stack overflow's rich text editor. Given this text:

[Text Example][1]

[1][http://www.example.com]

I want to loop each [string][int] that is found which I do this way:

var Text = "[Text Example][1]\n[1][http: //www.example.com]";
// Find resource links
var arrMatch = null;
var rePattern = new RegExp(
  "\\[(.+?)\\]\\[([0-9]+)\\]",
  "gi"
);
while (arrMatch = rePattern.exec(Text)) {
  console.log("ok");
}

This works great, it alerts 'ok' for each [string][int]. What I need to do though, is for each match found, replace the initial match with components of the second match.

So in the loop $2 would represent the int part originally matched, and I would run this regexp (pseduo)

while (arrMatch = rePattern.exec(Text)) {
    var FindIndex = $2; // This would be 1 in our example
    new RegExp("\\[" + FindIndex + "\\]\\[(.+?)\\]", "g")

    // Replace original match now with hyperlink
}

This would match

[1][http://www.example.com]

End result for first example would be:

<a href="http://www.example.com" rel="nofollow">Text Example</a>

Edit

I've gotten as far as this now:

var Text = "[Text Example][1]\n[1][http: //www.example.com]";
// Find resource links
reg = new RegExp(
  "\\[(.+?)\\]\\[([0-9]+)\\]",
  "gi");
var result;
while ((result = reg.exec(Text)) !== null) {
  var LinkText = result[1];
  var Match = result[0];
  Text = Text.replace(new RegExp(Match, "g"), '<a href="#">" + LinkText + "</a>');
}
console.log(Text);

Upvotes: 40

Views: 51571

Answers (7)

Slavik Meltser
Slavik Meltser

Reputation: 10361

I know it's old, but since I stumble upon this post, I want to strait the things up.

First of all, your way of thinking into solving this problem is too complicated, and when the solution of supposedly simple problem becomes too complicated, it is time to stop and think what went wrong. Second, your solution is super inefficient in a way, that you are first trying to find what you want to replace and then you are trying to search the referenced link information in the same text. So calculation complexity eventually becomes O(n^2).

This is very disappointing to see so many upvotes on something wrong, because people that are coming here, learning mostly from the accepted solution, thinking that this seems be legit answer and using this concept in their project, which then becomes a very badly implemented product.

The approach to this problem is pretty simple. All you need to do, is to find all referenced links in the text, save them as a dictionary and only then search for the placeholders to replace, using the dictionary. That's it. It is so simple! And in this case you will get complexity of just O(n).

So this is how it goes:

const text = `
 [2][https://en.wikipedia.org/wiki/Scientific_journal][5][https://en.wikipedia.org/wiki/Herpetology]

The Wells and Wellington affair was a dispute about the publication of three papers in the Australian Journal of [Herpetology][5] in 1983 and 1985. The publication was established in 1981 as a [peer-reviewed][1] [scientific journal][2] focusing on the study of [3][https://en.wikipedia.org/wiki/Amphibian][amphibians][3] and [reptiles][4] ([herpetology][5]). Its first two issues were published under the editorship of Richard W. Wells, a first-year biology student at Australia's University of New England. Wells then ceased communicating with the journal's editorial board for two years before suddenly publishing three papers without peer review in the journal in 1983 and 1985. Coauthored by himself and high school teacher Cliff Ross Wellington, the papers reorganized the taxonomy of all of Australia's and New Zealand's [amphibians][3] and [reptiles][4] and proposed over 700 changes to the binomial nomenclature of the region's herpetofauna.
[1][https://en.wikipedia.org/wiki/Academic_peer_review]    
[4][https://en.wikipedia.org/wiki/Reptile]          
`;

const linkRefs = {};
const linkRefPattern = /\[(?<id>\d+)\]\[(?<link>[^\]]+)\]/g;
const linkPlaceholderPattern = /\[(?<text>[^\]]+)\]\[(?<refid>\d+)\]/g;

const parsedText = text
    .replace(linkRefPattern, (...[,,,,,ref]) => (linkRefs[ref.id] = ref.link, ''))
    .replace(linkPlaceholderPattern, (...[,,,,,placeholder]) => `<a href="${linkRefs[placeholder.refid]}">${placeholder.text}</a>`)
    .trim();

console.log(parsedText);

Upvotes: -3

Ruslan L&#243;pez
Ruslan L&#243;pez

Reputation: 4477

Using back-references to to restrict the match so that the code will match if your text is:

[Text Example][1]\n[1][http://www.example.com]

and the code will not match if your text is:

[Text Example][1]\n[2][http://www.example.com]

var re = /\[(.+?)\]\[([0-9]+)\s*.*\s*\[(\2)\]\[(.+?)\]/gi;
var str = '[Text Example][1]\n[1][http://www.example.com]';
var subst = '<a href="$4">$1</a>';

var result = str.replace(re, subst);
console.log(result);

\number is used in regex to refer a group match number, and $number is used by the replace function in the same way, to refer group results.

Upvotes: 1

Mario V&#225;zquez
Mario V&#225;zquez

Reputation: 777

Another way to iterate over all matches without relying on exec and match subtleties, is using the string replace function using the regex as the first parameter and a function as the second one. When used like this, the function argument receives the whole match as the first parameter, the grouped matches as next parameters and the index as the last one:

var text = "[Text Example][1]\n[1][http: //www.example.com]";
// Find resource links
var arrMatch = null;
var rePattern = new RegExp("\\[(.+?)\\]\\[([0-9]+)\\]", "gi");
text.replace(rePattern, function(match, g1, g2, index){
    // Do whatever
})

You can even iterate over all groups of each match using the global JS variable arguments, excluding the first and last ones.

Upvotes: 0

Vasyl Gutnyk
Vasyl Gutnyk

Reputation: 5039

Here we're using exec method, it helps to get all matches (with help while loop) and get position of matched string.

    var input = "A 3 numbers in 333";
    var regExp = /\b(\d+)\b/g, match;
    while (match = regExp.exec(input))
      console.log("Found", match[1], "at", match.index);
    // → Found 3 at 2 //   Found 333 at 15 

Upvotes: 6

s4y
s4y

Reputation: 51685

I agree with Jason that it’d be faster/safer to use an existing Markdown library, but you’re looking for String.prototype.replace (also, use RegExp literals!):

var Text = "[Text Example][1]\n[1][http: //www.example.com]";
var rePattern = /\[(.+?)\]\[([0-9]+)\]/gi;

console.log(Text.replace(rePattern, function(match, text, urlId) {
  // return an appropriately-formatted link
  return `<a href="${urlId}">${text}</a>`;
}));

Upvotes: 40

Tom Gullen
Tom Gullen

Reputation: 61737

I managed to do it in the end with this:

var Text = "[Text Example][1]\n[1][http: //www.example.com]";
// Find resource links
reg = new RegExp(
  "\\[(.+?)\\]\\[([0-9]+)\\]",
  "gi");
var result;
while (result = reg.exec(Text)) {
  var LinkText = result[1];
  var Match = result[0];
  var LinkID = result[2];
  var FoundURL = new RegExp("\\[" + LinkID + "\\]\\[(.+?)\\]", "g").exec(Text);
  Text = Text.replace(Match, '<a href="' + FoundURL[1] + '" rel="nofollow">' + LinkText + '</a>');
}
console.log(Text);

Upvotes: 33

Jason McCreary
Jason McCreary

Reputation: 72971

This format is based on Markdown. There are several JavaScript ports available. If you don't want the whole syntax, then I recommend stealing the portions related to links.

Upvotes: 0

Related Questions