Dev Dev
Dev Dev

Reputation: 387

Extract data from HTML string in Javascript

I have a nodeJS script that reads HTML from a file as string. I would like to extract some data from it. My string (it is a string not HTML) is as following:

<tr><td style="text-align: center;">Initial Filing</td></tr>
                                        
<tr><td>Debtor</td></tr>

    <tr><td class="dName">PO</td></tr>
    <tr><td class="dAddress">CLACKAMAS OR 97015</td></tr>

<tr><td>Secured Party</td></tr>

    <tr><td class="spName">AS</td></tr>
    <tr><td class="spAddress">SPRINGFIELD IL 62708</td></tr>
    
<tr><td>Debtor</td></tr>
    <tr><td class="dName">ONE</td></tr>
    <tr><td class="dAddress">CLACKAMAS OR 97015</td></tr>

<tr><td>Secured Party</td></tr>

    <tr><td class="spName">ANY</td></tr>
    <tr><td class="spAddress">SPRINGFIELD IL 62708</td></tr>

The JavaScrit code I'm using is:

fs.readFile('file.txt', 'utf8', function (err, data) {
        if (err) {
            console.log("Error reading file.txt", err);
            process.exit(1);
        }
        var cleanedHtml = /<tr><td>Debtor<\/td><\/tr>(.*?)<tr><td>Secured Party<\/td><\/tr>/g.exec(html);
        console.log(cleanedHtml[1]);
    });

It returns to me this:

 return cleanedHtml[1];
                      ^
TypeError: Cannot read property '1' of null

Is there any issue with my regex? Also, how can I have an end result like this:

PO
CLACKAMAS OR 97015

AS
SPRINGFIELD IL 62708
    
ONE
CLACKAMAS OR 97015

ANY
SPRINGFIELD IL 62708

Thanks.

Upvotes: 0

Views: 4383

Answers (2)

Mamun
Mamun

Reputation: 68933

If you make sure that the tr elements are inside <table></table> then you can parse the string using DOMParser() after reading the file:

Demo:

var strHtml = `
  <table>
    <tr><td style="text-align: center;">Initial Filing</td></tr>

    <tr><td>Debtor</td></tr>

    <tr><td class="dName">PO</td></tr>
    <tr><td class="dAddress">CLACKAMAS OR 97015</td></tr>

    <tr><td>Secured Party</td></tr>

    <tr><td class="spName">AS</td></tr>
    <tr><td class="spAddress">SPRINGFIELD IL 62708</td></tr>

    <tr><td>Debtor</td></tr>
    <tr><td class="dName">ONE</td></tr>
    <tr><td class="dAddress">CLACKAMAS OR 97015</td></tr>

    <tr><td>Secured Party</td></tr>

    <tr><td class="spName">ANY</td></tr>
    <tr><td class="spAddress">SPRINGFIELD IL 62708</td></tr>
  </table>
  `

var doc = new DOMParser().parseFromString(strHtml, 'text/html');
var els = doc.querySelectorAll('.dName,.spName,.dAddress,.spAddress');
els.forEach((el) => {
  console.log(el.textContent);
});

Upvotes: 3

Tman
Tman

Reputation: 11

Should there not be brackets after console.log? Is the cleanedHtml a list with more than one element? Otherwise there is no cleanedHtml[1]

Upvotes: 0

Related Questions