Reputation: 143
Lang: Node JS
I'm using a Texteditor and I get the output string like this
<p>This is <strong>a <a href="#">test</a></strong></p>
but could be different HTML-tags like H1, H2, etc. but nothing more special than actual HTML text tags.
Now I want to turn that string into an object that I can work with and send to my database. So the perfect way would it be transformed into something like this...
[{type: "text", text: "This is ", bold: false}, {type: "text", text: "a ", bold: true}, {type: "link", text: "test", bold: true, href: "#}]
and so on.
I tried the Regex approach and split it by and do all sorts of logic to turn into a structured object but that can't be the best way to do it since it'll fail if I would in the future write <h1>Test</h1>
in the middle of the text as an example.
How would you approach this?
Upvotes: 0
Views: 811
Reputation: 3070
If you want to go easy, jsdom
or htmlparser2
and domhandler
would help doing that. For example, using htmlparser2
and domhandler
(from some of my apps):
// Parsers helpers
import { Parser } from 'htmlparser2';
import { DomHandler } from 'domhandler';
// Get all text contents, recursively
const getAllText = (node) => {
return node.children.map( n => {
if (n.type === 'text') {
return n.data.trim("\n\r ");
}
// Discard `small` tags
if (n.name === 'small') {
return ''
}
return getAllText(n);
}).join('')
}
// Parses HTML data containing a UL/LI/A tree
const parseMenu = (data) => {
const parseLink = (link) => {
const name = getAllText(link);
const code = link.attribs['data-value']?.trim("\n\r ");
return {
name,
...(code ? {code} : {}),
}
}
const parseLi = (li) => {
const ul = li.children.find(({type, name}) => type === 'tag' && name === 'ul' );
const link = li.children.find(({type, name}) => type === 'tag' && name === 'a' );
return {
...(link ? parseLink(link) : {}),
...(ul ? {children: parseUl(ul)} : {}),
}
}
const parseUl = (ul) => {
return ul.children.filter(({type, name}) => type === 'tag' && name === 'li' ).map( child => {
return parseLi(child);
});
}
let result;
const handler = new DomHandler( (error, dom) => {
if (error) {
// Handle error
} else {
// Parsing completed, do something
result = parseUl(dom[0]);
}
});
const parser = new Parser(handler);
parser.write(data);
parser.end();
return result;
}
Upvotes: 1
Reputation: 1002
Use cheerio library (or any other html parser library of your choise) and operate The "DOM Node" object as you wish.
Upvotes: 0