victorsc
victorsc

Reputation: 722

Get exact browser rendered text (RTL and LTR direction mix)

Is there a way to retrieve the actual rendered text by a browser (in the context of Right-to-left text direction)?

<html dir="rtl">
<body>
  <p id='ko'>Hello (world)</p>
  <p id='ok'>Hello <bdo dir='ltr'>(world)</bdo></p>
</body>
</html>

Will render :

enter image description here

enter image description here

But both document.getElementById('ok').textContent === document.getElementById('ko').textContent and document.getElementById('ok').innerText === document.getElementById('ko').innerText are true (for both browsers).

Is there a way to get the actual text that is displayed in the webpage?

https://jsfiddle.net/019kvo56/1/

Upvotes: 1

Views: 1399

Answers (1)

Kaiido
Kaiido

Reputation: 136678

There is an direction CSS property that you can grab from e.g getComputedStyle(elem), but this is only at the element level, so you can't know exactly how the browser did render the textNodes.

So what you need to do is :

  • first grab all the textNodes from your container (best done with a TreeWalker).
  • select each of its characters with an Range object
  • get each character's current position thanks to the Range's getBoundingClientRect() method.
  • sort them
  • get back their text values

Here is a live demo :

function getDisplayedText(container) {

  var r = document.createRange(); // to get our nodes positions

  var nodes = []; // first grab all the nodes
  var treeWalker = document.createTreeWalker(container, NodeFilter.SHOW_TEXT, null, false);
  while (treeWalker.nextNode()) nodes.push(treeWalker.currentNode);

  var chars = []; // then get all its contained characters
  nodes.forEach(n => {
    n.data.split('').forEach((c, i) => {
      r.setStart(n, i); // move the range to this character
      r.setEnd(n, i+1);
      chars.push({
        text: c,
        rect: r.getBoundingClientRect() // save our range's DOMRect
      })
    })
  });

  return chars.filter(c => c.rect.height) // keep only the displayed ones (i.e no script textContent)
    .sort((a, b) => { // sort ttb ltr
      if (a.rect.top === b.rect.top) {
        return a.rect.left - b.rect.left;
      }
      return a.rect.top - b.rect.top;
    })
    .map(n => n.text)
    .join('');
}

console.log('ko : ', getDisplayedText(ko));
console.log('ok : ', getDisplayedText(ok));
<div dir="rtl">
  <p id='ko'>Hello (world)</p>
  <p id='ok'>Hello <bdo dir='ltr'>(world)</bdo></p>
</div>

And now, as to why webkit does render the last ) flipped and first... I've got no idea if they're correct or not to do so...

Upvotes: 2

Related Questions