Reputation: 101
I am trying to remove the white space from a HTML file that was created from a Word Document (export to HTML) and I am still unsuccessful.
For example:
<p dir="ltr" class="pt-ListParagraph">
<span class="pt-000003"> </span></p>
<p dir="ltr" class="pt-Normal-000001">
<span class="pt-DefaultParagraphFont-000002">Work Instruction</span></p>
<p dir="ltr" class="pt-Normal">
<span class="pt-000000"> </span></p>
<p dir="ltr" class="pt-Normal">
<span class="pt-DefaultParagraphFont">DEFINITIONS AND ACRONYMS</span></p>
<p dir="ltr" class="pt-BodyText"><span class="pt-000004"> </span></p>
<p dir="ltr" class="pt-Normal-000005">
<span class="pt-DefaultParagraphFont-000006">DO Brief </span>
<span class="pt-DefaultParagraphFont-000007"> </span>
I have tried the CSS selector p span:empty
and p span:blank
, which does not work because it see the white space between the <span class="pt-000000"> </span>
. I have tried the options generated from this post title which are unsuccessful (jQuery is not an option) - I am at a loss. I would like to add a .js
file in the head of the HTML to run on page load that would remove all of the white space (<span class="pt-000000"> </span>
) that is generated when a Word Doc is converted to an HTML file. Can anyone offer me some advice?
Removing the spans are an option. However, the span classes will be different every time depending on the export of the Word Doc requiring me to make several span.classes. I have thought about that but figured it was just a band-aid on the issue.
UPDATE
window.addEventListener
did the trick:
window.addEventListener('load', function() {
var spans = document.getElementsByTagName('span');
for (var i = 0; i < spans.length; i++) {
if (spans[i].innerHTML.trim() == '') {
spans[i].remove();
}
}
});
Upvotes: 1
Views: 190
Reputation: 2611
You could use getElementsByTagName to retrieve a list of all the span
elements in your HTML file. Then walk through every span element and check whether it contains only whitespace. If so, set the innerHTML
of that span to en empty string.
Example:
var spans = document.getElementsByTagName('span');
for (var i = 0; i < spans.length; i++) {
if (spans[i].innerHTML.trim() == '') {
spans[i].remove();
}
}
Updated example:
window.addEventListener('load', function() {
var spans = document.getElementsByTagName('span');
for (var i = 0; i < spans.length; i++) {
if (spans[i].innerHTML.trim() == '') {
spans[i].remove();
}
}
});
Upvotes: 1
Reputation: 115212
Get all span using querySelectorAll()
or getElementsByTagName()
, then iterate over them using Array#forEach
and remove based on content using remove()
method.
Array.from(document.querySelectorAll('span')).forEach(function(ele) {
if (!ele.textContent.trim()) ele.remove();
});
// or
[].slice.call(document.querySelectorAll('span')).forEach(function(ele) {
if (!ele.textContent.trim()) ele.remove();
});
<p dir="ltr" class="pt-ListParagraph">
<span class="pt-000003"> </span></p>
<p dir="ltr" class="pt-Normal-000001">
<span class="pt-DefaultParagraphFont-000002">Work Instruction</span></p>
<p dir="ltr" class="pt-Normal">
<span class="pt-000000"> </span></p>
<p dir="ltr" class="pt-Normal">
<span class="pt-DefaultParagraphFont">DEFINITIONS AND ACRONYMS</span></p>
<p dir="ltr" class="pt-BodyText"><span class="pt-000004"> </span></p>
<p dir="ltr" class="pt-Normal-000005">
<span class="pt-DefaultParagraphFont-000006">DO Brief </span>
<span class="pt-DefaultParagraphFont-000007"> </span>
The remove()
method only works in latest browser so use following code for instead.
ele.parentNode.removeChild(ele);
Upvotes: 1