Reputation: 99
I have a scenario, where I am trying to extract the values for following text in html and store in a variable. As of now I have tried Cheerio But it doesn't seem to work.
HTML :
var htmlbody = <table style="width:100%; border: 1px solid #cccccc; border-collapse: collapse;" border=1 cellspacing="0" cellpadding="4"><tr><td style="background-color: #eeeeee; width: 200px;">Improvement Date (first date)</td><td>Nov 5, 2019 1:57:00 PM UTC</td></tr><tr><td style="background-color: #eeeeee">Document Call existed at</td><td>Nov 5, 2019 3:40:00 PM UTC</td></tr><tr><td style="background-color: #eeeeee">Document creation at</td><td>not available</td></tr><tr><td style="background-color: #eeeeee; width: 200px;">First document sent</td><td>not available</td></tr></table>
What I have tried here
const cheerio = require('cheerio')
var html = htmlbody
const txt = $(html).text()
console.log(txt)
I want to extract this below values from the html individually in exact order and store in a variable individually.
Nov 5, 2019 1:57:00 PM UTC
Nov 5, 2019 3:40:00 PM UTC
not available
not available
Note : HTML snippet that I have will not have any class or id assigned.
Upvotes: 0
Views: 686
Reputation: 633
This can be achieved by parsing through the content. Please refer to the code below.
const cheerio = require('cheerio');
var htmlbody = '<table style="width:100%; border: 1px solid #cccccc; border-collapse: collapse;" border=1 cellspacing="0" cellpadding="4"><tr><td style="background-color: #eeeeee; width: 200px;">Improvement Date (first date)</td><td>Nov 5, 2019 1:57:00 PM UTC</td></tr><tr><td style="background-color: #eeeeee">Document Call existed at</td><td>Nov 5, 2019 3:40:00 PM UTC</td></tr><tr><td style="background-color: #eeeeee">Document creation at</td><td>not available</td></tr><tr><td style="background-color: #eeeeee; width: 200px;">First document sent</td><td>not available</td></tr></table>';
const $ = cheerio.load(htmlbody);
var html = $('table').children();
var tr = $("tr", html);
var val = {};
for(var i = 0; i < tr.length; i++) {
var td = $("td", tr[i]);
val[$(td[0]).html()] = $(td[1]).html();
}
// The extracted values are stored in key value pair
// 'Improvement Date (first date)': 'Nov 5, 2019 1:57:00 PM UTC',
// 'Document Call existed at': 'Nov 5, 2019 3:40:00 PM UTC',
// 'Document creation at': 'not available',
// 'First document sent': 'not available'
console.log(val);
Upvotes: 1