Reputation: 18520
We build bespoke WordPress themes, and recently have been receiving complaints regarding the sequence of headings. Most automated tools, including Google's Lighthouse, suggest that you should never skip heading levels, in order to properly communicate page structure for screen readers and other accessibility tools.
This issue is largely due to the way our clients enter content. They tend to prefer picking a visually pleasing heading, rather than the "correct" heading sequentially, so we'll often end up with pages that have an h1, then an h4, then a set of h2s, and so on. We've told these clients that they can fix this by properly entering content, but this seems to be asking too much of them, much like entering alt text for images.
To "solve" this issue, I'm trying to write a filter that will parse the_content
, identify all of the headings, and replace their tags so that they become sequential, retaining classes for styling. I realize that this isn't a perfect solution, as the intended heading structure really can't be assumed programmatically, but this is the only viable solution I've been able to determine (if someone has a better idea, please, do tell).
So, for example, the code the user generates could be something like this:
<h2 class="title--h2">This is a second level heading</h2>
<p>Etiam vitae erat ullamcorper ipsum ultrices convallis ac quis nulla. Nam euismod imperdiet enim eu venenatis. Nulla non bibendum dui. Maecenas id tincidunt orci. Sed pellentesque ipsum et tempor convallis. Etiam elementum augue aliquet enim venenatis tincidunt. Praesent nunc dolor, vulputate nec aliquet consectetur, aliquet nec elit. Vivamus non eros nec nibh vestibulum lacinia. Morbi diam turpis, accumsan ac fringilla eget, fringilla vitae lorem. Ut consequat tortor orci, sed lobortis metus facilisis nec. Nulla sed enim in tortor blandit aliquet. Curabitur a finibus mi.</p>
<h4 class="title--h4">This is a fourth level heading</h4>
<p>Nullam blandit, mauris vel vestibulum aliquet, quam lectus laoreet mi, id euismod ligula augue sit amet velit. Suspendisse suscipit lacus quis mauris varius, sed cursus mi auctor. Nullam non augue in ante malesuada blandit. Nam eu purus commodo, porttitor odio commodo, tristique nunc. Suspendisse vitae vehicula turpis. Aenean turpis nibh, auctor ac mollis congue, iaculis id tortor. Morbi in est erat. Proin aliquam varius neque a sollicitudin. Vestibulum varius in urna sit amet hendrerit.</p>
<h4 class="title--h4">This is a fourth level heading</h4>
<p>Donec vitae est sapien. Nulla facilisi. Quisque sed auctor ante, sed viverra elit. Quisque justo arcu, vulputate tempor odio ac, mollis blandit justo. Morbi viverra tincidunt leo vel mattis. Aliquam erat volutpat. Nunc tortor tellus, porta sit amet tellus sed, interdum condimentum ex. </p>
And the output would be:
<h2 class="title--h2">This is a second level heading</h2>
<p>Etiam vitae erat ullamcorper ipsum ultrices convallis ac quis nulla. Nam euismod imperdiet enim eu venenatis. Nulla non bibendum dui. Maecenas id tincidunt orci. Sed pellentesque ipsum et tempor convallis. Etiam elementum augue aliquet enim venenatis tincidunt. Praesent nunc dolor, vulputate nec aliquet consectetur, aliquet nec elit. Vivamus non eros nec nibh vestibulum lacinia. Morbi diam turpis, accumsan ac fringilla eget, fringilla vitae lorem. Ut consequat tortor orci, sed lobortis metus facilisis nec. Nulla sed enim in tortor blandit aliquet. Curabitur a finibus mi.</p>
<h3 class="title--h4">This is a fourth level heading</h3>
<p>Nullam blandit, mauris vel vestibulum aliquet, quam lectus laoreet mi, id euismod ligula augue sit amet velit. Suspendisse suscipit lacus quis mauris varius, sed cursus mi auctor. Nullam non augue in ante malesuada blandit. Nam eu purus commodo, porttitor odio commodo, tristique nunc. Suspendisse vitae vehicula turpis. Aenean turpis nibh, auctor ac mollis congue, iaculis id tortor. Morbi in est erat. Proin aliquam varius neque a sollicitudin. Vestibulum varius in urna sit amet hendrerit.</p>
<h4 class="title--h4">This is a fourth level heading</h4>
<p>Donec vitae est sapien. Nulla facilisi. Quisque sed auctor ante, sed viverra elit. Quisque justo arcu, vulputate tempor odio ac, mollis blandit justo. Morbi viverra tincidunt leo vel mattis. Aliquam erat volutpat. Nunc tortor tellus, porta sit amet tellus sed, interdum condimentum ex. </p>
Again, I realize this is going to lead to unintended structure (I included an example of this in the above demonstration), but this is what my clients are asking for, so I'm giving in.
The code I have so far will track the previous heading level and determine what the new level should be, but I'm having difficulty understanding how to actually replace the tags correctly. My understanding is that modifying the DOM with $node->replaceChild()
is going to result in items getting skipped, because the DOM is changing while its being parsed. Additionally, I'd like to retain all attributes on each heading, but I've been unable to locate a method for this; everything suggests copying individual attributes manually, but because this is CMS-driven, I'm worried that custom or unexpected attributes will be missed.
Here's the filter I have so far:
/**
* Ensure heading levels are always in sequence
*
* @param string $content
* @return string
*/
function namespace_fix_title_sequence(string $content): string {
if (! (is_admin() && ! wp_doing_ajax()) && $content) {
$DOM = new DOMDocument();
/**
* Use internal errors to get around HTML5 warnings
*/
libxml_use_internal_errors(true);
/**
* Load in the content, with proper encoding and an `<html>` wrapper required for parsing
*/
$DOM->loadHTML("<?xml encoding='utf-8' ?><html>{$content}</html>", LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
/**
* Clear errors to get around HTML5 warnings
*/
libxml_clear_errors();
/**
* Use XPath to query headings
*/
$XPath = new DOMXPath($DOM);
$headings = $XPath->query("//*[self::h1 or self::h2 or self::h3 or self::h4 or self::h5 or self::h6]");
/**
* Track previous heading level
*/
$previous_level = 1;
foreach ($headings as $heading) {
/**
* Get the current level
*/
$current_level = intval(preg_replace("/^h/", "", $heading->nodeName));
/**
* Determine the target level
*/
$target_level = ($current_level - $previous_level <= 1 ? $current_level : $previous_level + 1);
/**
* DEBUG
*/
echo "<p>Previous: {$previous_level}</p>";
echo "<p>Current: {$current_level}</p>";
echo "<p>Target: {$target_level}</p>";
echo "<hr />";
/**
* Replace current level with target level
*/
// ?
/**
* Update the previous level
*/
$previous_level = $target_level;
}
/**
* Save changes, remove unneeded tags
*/
$content = implode(array_map([$DOM->documentElement->ownerDocument, "saveHTML"], iterator_to_array($DOM->documentElement->childNodes)));
}
return $content;
}
add_filter("the_content", "namespace_fix_title_sequence", 100, 1);
Upvotes: 2
Views: 138
Reputation: 14687
In an ideal world, the best solution would be to totally prevent the content writer from selecting incorrect heading levels in the interface of their WYSIWYG. As equally as you should maybe force them to put a non-empty alt text for images, a label for input fields, forbid empty links, etc.
Given some place in the document, they would only be allowed to put an heading of level 1 to N+1 where N is the level of the previous heading.
Consider that adjustments would also possibly have to be propagated, i.e. changing an H3 into an H2 in the middle of the text should also change all the following H4 into H3 down to the next H2, and so recursively. This is, as you see, not as easy as we may think at first.
Sadly, not only it isn't that easy, neither to develop and to use, but anyway, writers are probably not ready for that. Those who don't understand the need for correct structuration will also probably qualify the restriction as a bug or a stupid software limitation against their freedom to write anything in the way they like. Maybe you could decorelate heading level from the corresponding visual style to avoid frustration, but it's becoming quickly even more complicated.
So the only thing that you can do is educate content writers, or, just as you are proposing it here, trying to fix the incorrect structure automatically.
Before getting more in the real taslk of DOM manipulation, let's talk a little about an algorithm. It's of course impossible to always fix the stucture in the way the author wanted it to be 100% of the time, but the goal is still trying to choose the most probable thing the author wanted to do.
IF we take your example back, the author wrote H2, H4, H3, H3. Is the simplest fix, H2, H3, H3, H3 the most appropriate? What about H2, H3, H4, H4? Based on the fact that if two elements are visually different, it was probably intended that they are at different levels, and conversely, if two elements are visually identical, it was also probably intended that they are on the same level.
As far as I know, most DOM API I have ever seen in Java, JavaScript, PHP, C++, etc. effectively don't allow you to directly change the element name in place. You must create a new node to do that. You can't simply change an H4 into an H3 while retaining the inner structure untouched for example. So, if you indeed can't change the element name in place, you need to:
Upvotes: 1