Ryυĸ Kυяαɪ
Ryυĸ Kυяαɪ

Reputation: 15

Auto link word on a page using javascript

The code below is meant to put links in words, but it works only with english words, I would like it to work with arabic words too

The code

    <script>
// <![CDATA[
document.addEventListener("DOMContentLoaded", function(){
   var links = {
      "مغامرات": "https://www.example.com/search/label/%D9%85%D8%BA%D8%A7%D9%85%D8%B1%D8%A7%D8%AA",
      "East": "https://www.example.com/search/label/%D8%AE%D9%8A%D8%A7%D9%84",
   }
   
   var bodi = document.querySelectorAll("body *:not(script)");
   for(var x=0; x<bodi.length; x++){
      var html = bodi[x].innerHTML;
      for(var i in links){
         var re = new RegExp("([\\s|&nbsp;]"+i+"(?:(?=[,<.\\s])))", "gi");
         var matches = html.match(re);
         if(matches){
            matches = html.match(re)[0].trim();
            html = html.replace(re, function(a){
               return ' <a href="'+links[i]+'">'+a.match(/[A-zÀ-ú]+/)[0].trim()+'</a>';
            });
         }
      }
      bodi[x].innerHTML = html;
   }
});
// ]]>
</script>

Upvotes: 0

Views: 259

Answers (1)

MMMahdy-PAPION
MMMahdy-PAPION

Reputation: 1101

Let me change the way that you choose to a better and more understandable one.

In this example we made a simple function to detect words by a dynamic RegEx and replace an anchor (a) tag with link instead:

function linkWords(elem,words,links) {
  // Using innerHTML to replace anchor links easier
  elem.innerHTML=elem.innerHTML.replace(
    // Make a (g:global, i:case insensitive) RegEx from joinig words by groups indicators
    // (!) Group indicators will pass to the function arguments as their index 
    RegExp('('+words.join(')|(')+')','gi'),
      // This function will get arguments like this:
      // [match,parenthesized capture group...,offset,string]
      function(){
        // So we ignore the first one and the last two
        for (var i=1;i<arguments.length-2;i++)
          // If we found not undefined group
          if (arguments[i])
            // We return the captured match mixed with the anchor tag using the same index of the link
            return '<a href="'+links[i-1]+'">'+arguments[0]+'</a>';
      }
  );
}
document.addEventListener("DOMContentLoaded", linkWords(document.body,
  ["كلمة","word"],
  ["https://www.example.com/search/%D9%83%D9%84%D9%85%D8%A9","https://www.example.com/search/word"]
 ));
<div>Hi! this is a word and it have to be linked.</div>
<div dir="rtl">السلام علیکم! هذه كلمة ويجب ربطها.</div>

For understanding well what happen up there, you can read more about these resources:

Important notes: That function is just a first step for understanding and used an experimental method, not a standard (trust-able) method, because of these possibilities:

  1. Special reserved character in RegEx are not escaped
  2. Using innerHTML without special cares may not get the HTML Encode characters
  3. We have some same characters in Arabic/Persian that look same but actually different Unicode, so we have to detect them as same characters. Example: ي ى ی Ya - 4 ٤ ۴ Numbers etc...
  4. It can not detect difference between readable texts Node.textContent or attributes or non-readable tags texts like <style> or <script>
  5. Needing many Pre-fix & Post-Fix for inputs to don't make mistakes. Example; duplicates in inputs can make mess like: ['win','window'] or not detecting already linked words

Also usually these kind of acts should be Server-side to avoid many Client-side possible mistakes.

So if you want to keep doing it in Client-side (Front-End):


Update

If we want to solve the problem of avoiding linking of already linked words, And we also want to look at the issue in a simplistic way, we can add a Negative Look-ahead in our pattern for improving the RegEx.

Live example, for understanding how it work:
https://regexr.com/6c00r

Visualized pattern:
Visualized pattern https://jex.im/regulex/#!flags=ig&re=(%3F!%5B%5E%3E%5D*%3C%5C%2Fa%3E)(%3F%3A(word)%7C(%D9%83%D9%84%D9%85%D8%A9))

function linkWords(elem, words, links) {
    elem.innerHTML = elem.innerHTML.replace(
        // Improved RegEx by adding Negative lookahead that check not between <a>
        RegExp('(?![^>]*</a>)(?:(' + words.join(')|(') + '))', 'gi'),
        function() {
            for (var i = 1; i < arguments.length - 2; i++)
                if (arguments[i])
                    return '<a href="' + links[i - 1] + '">' + arguments[0] + '</a>';
        }
    );
}
document.addEventListener("DOMContentLoaded", linkWords(
    document.querySelector('.me'), // <---- First argument choose the target element
    ["كلمة", "word"], // Array of the targeted words
    ["https://www.example.com/search/%D9%83%D9%84%D9%85%D8%A9", "https://www.example.com/search/word"] // Array of the words links
));
.me {background: #efefef;}
a {text-decoration: underline;}
<div class="me">
  <div>Hi! this is a word and it have to be linked.</div>
  <div dir="rtl">السلام علیکم! هذه كلمة ويجب ربطها.</div>
  <div>Not this <a>word</a> that already is inside an anchor tag. But still this WORD.</div>
</div>
<br>
<div>Another element: word word word</div>

Explaining the RegEx:

  • (?![^>]*<\/a>) the Negative Lookahed (?!...) will check:
    • If next group (?:...) is NOT:
      • Start with any [^>] (ANY, NOT finished tag character)
        • As much as *
          • Find a closed anchor </a>
  • Then check if the Non-Capturing group (?:...) (to don't change the index of other groups)
    • Have any of this (word) (Group number 1)
    • Or |
    • This (کلمه) (Group number 2)
    • Or more (word1)|(word2)|(word3)|...
  • When you took the word /.../gi:
    • g (Global) Go next for more
    • i (Ignore Case) Be case insensitive (A=a)

Upvotes: 1

Related Questions