Ruby: Parse a simple markdown files (having similar, but not equal structure) and fill contents into object's attributes

Question

I have a folder full of markdown files. Each of them I want to read into the following Ruby object:

class File
  attr_accessor :title, :description, :content
end

The markdown files usually look like this:

# This is the title

This is some description.

And even more description.

## This is an h2

Bla bla.

## This is another h2

More bla bla.

### This is even an h3

Again, more bla bla.

## Again, an h2

etc. etc.

This should result in this Ruby object:

File:
  h1: "This is the title"
  description: "This is some description.

And even more description."
  content: "## This is an h2...etc. etc."

To assign the content of the file to the Ruby object's definition, I could simply use a regular expression which would extract title (the first H1), description (the text right between H1 and the following H2), and content (all the rest).

But the files do not always look exactly like this:

Sometimes, there is no H1
- (If so, the file name will be used for title)
Sometimes, there is no description
Sometimes, there is no content

These exceptions can occur in combinations, ie. a file without H1 and description:

## This is an h2

Bla bla.

## This is another h2

More bla bla.

This should result in this Ruby object:

File:
  h1: nil
  description: nil
  content: "## This is an h2...More bla bla."

Or a file with H1 but no description:

# This is the title

## This is an h2

Bla bla.

This should result in this Ruby object:

File:
  h1: "This is the title"
  description: nil
  content: "## This is an h2...Bla bla.

Or a file with no H1, but a description:

This is a description.

Some more description.

## This is an h2

Bla bla.

This should result in this Ruby object:

File:
  h1: nil
  description: This is a description...Some more description.
  content: "## This is an h2...Bla bla.

I wonder whether I can do this using a single fancy regular expression (I'm no expert in that), or whether I should try to somehow split it into several process steps. I asked a similar question here: Markdown: Regex to find all content following an heading #2 (but stop at another heading #2), but I couldn't get the regex to run properly using Ruby with the exceptions described above.

Any idea how to solve this problem is highly welcome. Thank you.

PS: I also thought about parsing the markdown using a markdown parser and then use Nokogiri or something which would allow me to parse the results. But this feels like way too much overhead for such a basically simple requirement.

Ruby: Parse a simple markdown files (having similar, but not equal structure) and fill contents into object's attributes

Answers (1)

Related Questions

Ruby: Parse a simple markdown files (having similar, but not equal structure) and fill contents into object&#39;s attributes

Answers (1)

Related Questions

Ruby: Parse a simple markdown files (having similar, but not equal structure) and fill contents into object's attributes