JohiOakey
JohiOakey

Reputation: 143

Turn HTML string into a organized Object

Lang: Node JS

I'm using a Texteditor and I get the output string like this

<p>This is <strong>a <a href="#">test</a></strong></p>

but could be different HTML-tags like H1, H2, etc. but nothing more special than actual HTML text tags.

Now I want to turn that string into an object that I can work with and send to my database. So the perfect way would it be transformed into something like this...

[{type: "text", text: "This is ", bold: false}, {type: "text", text: "a  ", bold: true}, {type: "link", text: "test", bold: true, href: "#}]

and so on.

I tried the Regex approach and split it by and do all sorts of logic to turn into a structured object but that can't be the best way to do it since it'll fail if I would in the future write <h1>Test</h1> in the middle of the text as an example.

How would you approach this?

Upvotes: 0

Views: 811

Answers (2)

emi
emi

Reputation: 3070

If you want to go easy, jsdom or htmlparser2 and domhandler would help doing that. For example, using htmlparser2 and domhandler (from some of my apps):

// Parsers helpers
import { Parser } from 'htmlparser2';
import { DomHandler } from 'domhandler';

// Get all text contents, recursively
const getAllText = (node) => {
  return node.children.map( n => {
    if (n.type === 'text') {
      return n.data.trim("\n\r ");
    }

    // Discard `small` tags
    if (n.name === 'small') {
      return ''
    }

    return getAllText(n);
  }).join('')
}

// Parses HTML data containing a UL/LI/A tree
const parseMenu = (data) => {

  const parseLink = (link) => {
    const name = getAllText(link);
    const code = link.attribs['data-value']?.trim("\n\r ");
    return {
      name,
      ...(code ? {code} : {}),
    }
  }

  const parseLi = (li) => {
    const ul = li.children.find(({type, name}) => type === 'tag' && name === 'ul' );
    const link = li.children.find(({type, name}) => type === 'tag' && name === 'a' );
    return {
      ...(link ? parseLink(link) : {}),
      ...(ul ? {children:  parseUl(ul)} : {}),
    }
  }

  const parseUl = (ul) => {
    return ul.children.filter(({type, name}) => type === 'tag' && name === 'li' ).map( child => {
      return parseLi(child);
    });
  }

  let result;
  const handler = new DomHandler( (error, dom) => {
    if (error) {
      // Handle error
    } else {
      // Parsing completed, do something
      result = parseUl(dom[0]);
    }
  });

  const parser = new Parser(handler);
  parser.write(data);
  parser.end();
  return result;
}

Upvotes: 1

Ivan Kolyhalov
Ivan Kolyhalov

Reputation: 1002

Use cheerio library (or any other html parser library of your choise) and operate The "DOM Node" object as you wish.

Upvotes: 0

Related Questions