Thomas
Thomas

Reputation: 99

extract the text values from html response nodejs

I have a scenario, where I am trying to extract the values for following text in html and store in a variable. As of now I have tried Cheerio But it doesn't seem to work.

HTML :

var htmlbody = <table style="width:100%; border: 1px solid #cccccc; border-collapse: collapse;" border=1 cellspacing="0" cellpadding="4"><tr><td style="background-color: #eeeeee; width: 200px;">Improvement Date (first date)</td><td>Nov 5, 2019 1:57:00 PM UTC</td></tr><tr><td style="background-color: #eeeeee">Document Call existed at</td><td>Nov 5, 2019 3:40:00 PM UTC</td></tr><tr><td style="background-color: #eeeeee">Document creation at</td><td>not available</td></tr><tr><td style="background-color: #eeeeee; width: 200px;">First document sent</td><td>not available</td></tr></table>

What I have tried here

   const cheerio = require('cheerio')
   var html = htmlbody
   const txt = $(html).text()
   console.log(txt)

I want to extract this below values from the html individually in exact order and store in a variable individually.

Nov 5, 2019 1:57:00 PM UTC
Nov 5, 2019 3:40:00 PM UTC
not available
not available

Note : HTML snippet that I have will not have any class or id assigned.

Upvotes: 0

Views: 686

Answers (1)

Arun Selin
Arun Selin

Reputation: 633

This can be achieved by parsing through the content. Please refer to the code below.

const cheerio = require('cheerio');

var htmlbody = '<table style="width:100%; border: 1px solid #cccccc; border-collapse: collapse;" border=1 cellspacing="0" cellpadding="4"><tr><td style="background-color: #eeeeee; width: 200px;">Improvement Date (first date)</td><td>Nov 5, 2019 1:57:00 PM UTC</td></tr><tr><td style="background-color: #eeeeee">Document Call existed at</td><td>Nov 5, 2019 3:40:00 PM UTC</td></tr><tr><td style="background-color: #eeeeee">Document creation at</td><td>not available</td></tr><tr><td style="background-color: #eeeeee; width: 200px;">First document sent</td><td>not available</td></tr></table>';

const $ = cheerio.load(htmlbody);

var html = $('table').children();
var tr = $("tr", html);
var val = {};
for(var i = 0; i < tr.length; i++) {
    var td = $("td", tr[i]);
    val[$(td[0]).html()] = $(td[1]).html();
}
// The extracted values are stored in key value pair
// 'Improvement Date (first date)': 'Nov 5, 2019 1:57:00 PM UTC',
// 'Document Call existed at': 'Nov 5, 2019 3:40:00 PM UTC',
// 'Document creation at': 'not available',
// 'First document sent': 'not available'
console.log(val);

Upvotes: 1

Related Questions