edt
edt

Reputation: 22440

How to use Node.js to create modified versions of html documents?

I am trying to do this:

  1. Read html document "myDocument.html" with Node
  2. Insert contents of another html document named "foo.html" immediately after the open body tag of myDocument.html.
  3. Insert contents of yet another html document named "bar.html" immediately before the close body tag of myDocument.html.
  4. Save the modified version of "myDocument.html".

To do the above, I would need to search the DOM with Node to find the open and closing body tags. How can this be done?

Upvotes: 0

Views: 129

Answers (3)

Peter Lyons
Peter Lyons

Reputation: 146084

Use the cheerio library, which has a simplified jQuery-ish API.

var cheerio = require('cheerio');
var dom = cheerio(myDocumentHTMLString);
dom('body').prepend(fooHTMLString);
dom('body').append(barHTMLString);
var finalHTML = dom.html();

And just to be clear since the legions of pro-regex individuals are already appearing in droves, yes you need a real parser. No you cannot use a regular expression. Read Stackoverflow lead developer Jeff Atwood's post on parsing HTML the Cthulhu way.

Upvotes: 0

Shrey Gupta
Shrey Gupta

Reputation: 5617

Very simply, you can use the native Filesystem module that comes with Node.JS. (var fs = require("fs")). This allows you to read and convert the HTML to a string, perform string replace functions, and finally save the file again by rewriting it.

The advantage is that this solution is completely native, and requires no external libraries. It is also completely faithful to the original HTML file.

//Starts reading the file and converts to string.
fs.readFile('myDocument.html', function (err, myDocData) {
      fs.readFile('foo.html', function (err, fooData) { //reads foo file
          myDocData.replace(/\<body\>/, "<body>" + fooData); //adds foo file to HTML
          fs.readFile('bar.html', function (err, barData) { //reads bar file
              myDocData.replace(/\<\/body\>/, barData + "</body>"); //adds bar file to HTML
              fs.writeFile('myDocumentNew.html', myDocData, function (err) {}); //writes new file.
          });
      });
});

Upvotes: 1

Andrew
Andrew

Reputation: 5340

In a simple but not accurate way, you can do this:

str = str.replace(/(<body.*?>)/i, "$1"+read('foo.html'));

str = str.replace(/(<\/body>)/i, read('bar.html')+'$1');

It will not work if the myDocument content contains multiple "<body ..' or '</body>', e.g. in javascript, and also the foo.html and bar.html can not contains '$1' or '$2'...

If you can edit the content of myDocument, then you can leave some "placeholder" there(as html comments), like

<!--foo.html-->

Then, it's easy, just replace this "placeholder" .

Upvotes: 0

Related Questions