user4943236
user4943236

Reputation: 6334

Parsing HTML page with HTML Dom or jquery

I need to parse the HTML content such that every header has its content as provided in <li> tags. Any pointers would be appreciated. My HTML generated is like below

<div id="word_content">
<br>Testing Time: 2015-10-29 17:57:11<br>
        Total Age: 19<br>
        Total Friemd: 9<br>
        Total Family: 10<br>
        <br>
    Here are the suggestions  - Him_530037_: <a href="www.mytarget.com="_blank">93358546</a>
    <h3>Overview</h3><br>
    <ul>
        <li>(The overlap provided is not good)</li>
    </ul>

    <h3>Structure</h3><br>
    <h4>Target:</h4><br>
    <ul>
        <li>Audience.</li>
        <li>Lookalike</li>
        <li>Overlap of Audience</li>            
    </ul>
</div>

The results should be:

Overview:The Overlap provided is not good
Structure:
Target: Audience, Lookalike, Overlap of audience

I was thinking something on these lines, but not able to move forward

        nodes = document.getElementById("word_content");
        var $result = new Array();
        for (i=0; i < nodes.childNodes.length; i++) 
        { 
            if (nodes.childNodes[i].nodeValue !=null) 
                {
                    $result[i]= nodes.childNodes[i].nodeValue;
                }
        }

Upvotes: 0

Views: 35

Answers (1)

vijayP
vijayP

Reputation: 11502

You can refer below jquery code. But its as per your HTML code. If you make any change in your HTML code; you will have to make changes in jquery code accordingly.

$(document).ready(function(){
  
  var headTags = $("div#word_content").find("*").filter(function(){
    return /^h/i.test(this.nodeName);
  });
  
  var output = {};
  
  $(headTags).each(function(){
    var currentHead = $(this);
     
    var nextNextElem = currentHead.next().next();
    var innerText = [];
    if(nextNextElem.prop("tagName") == "UL")
      {
         nextNextElem.find("li").each(function(){
           innerText.push($(this).text());
         });  
        
      }
    
    output["\""+currentHead.text()+"\""] = innerText;
  });  
  
  alert(JSON.stringify(output));
  console.log(output);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js"></script>
<div id="word_content">
<h3>Overview</h3><br>
<ul>
<li>(The overlap provided is not good)</li>
</ul>
<h3>Structure</h3><br>
<h4>Target:</h4><br>
        <ul>
            <li> Audience.</li>
            <li>Lookalike</li>
            <li>Overlap of Audience </li>           
        </ul></div>

Upvotes: 1

Related Questions