George Welder
George Welder

Reputation: 4055

Turn html text into markdown manually (javascript / nodejs)

I'm a bit stuck. I have scraped a website and would now like to convert it into markdown. My html looks like this:

Some text more text, and more text. Some text more text, and more text. 
Once in a while  <span class="bold">something is bold</span>. 
Then some more text. And <span class="bold">more bold stuff</span>.

There are html to markdown modules available, however, they would only work if the text <b> looked like this </b>.

How could I go through the html, and everytime I find a span which is supposed to bold something, turn this piece of the html into bold markdown, that is, make it **look like this**

Upvotes: 0

Views: 2557

Answers (2)

Yi Kai
Yi Kai

Reputation: 640

Try this one https://github.com/domchristie/to-markdown, an HTML to Markdown converter written in JavaScript.

It can be extended by passing in an array of converters to the options object:

toMarkdown(stringOfHTML, { converters: [converter1, converter2, …] });

In your case, the converter can be

{
    filter: 'span',
    replacement: function(content) {
       return '**' + content + '**';
   }
}

Refer to its readme for more details.

Upvotes: 2

Bill Bell
Bill Bell

Reputation: 21663

Notepad++ is an open-source editor that supports regex. This picture shows the basic idea.

You know how to use an editor to find and replace strings. In an editor like Notepad++ you can look for string patterns and replace parts of the patterns and keep what's left. In your case, you want to find strings that are framed by HTML markup. Here the regex in the 'Find what' edit box displays that, with the special notation ([^<]*) meaning save zero or more of any character other than the '<' for use in a replacement string. The 'Replace with' edit box says used what was saved (as \1) in the expression **\1** which gives you what you prefer to have in the text file. It remains to click on 'Replace all'.

using Notepad++

To be able to do this you need to install Notepad++ and learn some basic Perl regex. To get this dialogue box click on Ctl-H. Of course, if you get it wrong there's always Ctl-Z.

Upvotes: -1

Related Questions