warchest
warchest

Reputation: 437

Using Javascript to parse file with regex and convert info into json

How can I organize the following data format into JSON? The data is a simple text file that looks like:

<--Header Title-->
Some block of info here
<--Header Title-->
Some block of info here
<--Header Title-->
Some block of info here

There are some tricky bits, like:

a) Possible to contain <--Header Title--> inside the Some block of info here:

<--Header Title-->
I am info for <--Header Title-->
<--Header Title-->
This <--Header Title--> is finished
<--Header Title-->
<--Header Title--> contains the following:
stuff1
stuff2
stuff3

b) Some block of info here can either be empty or just whitespaces/newlines

<--Header Title-->
<--Header Title-->


<--Header Title-->
Info text here

c) Some block of info here format should be preserved, unless it's only whitespaces/newlines like in (b). So the following should preserve the leading and ending whitespaces/newlines:

<--Header Title-->

More info about blah

blah blah blah
blah blah

<--Header Title-->
Another info about blah

All in all, I'd like to convert this into a json for easy retrieval. A simple example:

<-- Option 1 -->
Nice text
<-- Option 2-->


<--Final stuff-->
Listing all
of
the
text

<--Header Title-->
I am info for <--Header Title-->
<--Header Title-->
This <--Header Title--> is finished
<--Header Title-->
<--Header Title--> contains the following:
stuff1
stuff2
stuff3

json:

{
  "data":
    [
        {"Option 1": "Nice text"},
        {"Option 2": ""},
        {"Final stuff": "Listing all\nof\nthe\ntext\n"},
        {"Header Title": "I am info for <--Header Title-->"},
        {"Header Title": "This <--Header Title--> is finished"},
        {"Header Title": "<--Header Title--> contains the following:\nstuff1\nstuff2\nstuff3"}
    ]
}

My current regex is:

\<\-\-(.*)\-\-\>\n(.*)

But this only captures the first occurrence, and only if (a) doesn't occur.

Upvotes: 1

Views: 74

Answers (1)

Pranav C Balan
Pranav C Balan

Reputation: 115242

You can use regex /<--([\w\s]+)-->([\s\S]*?)(?=\n<--|$)/g and do something like this

var str = `<-- Option 1 -->
Nice text
<-- Option 2-->


<--Final stuff-->
Listing all
of
the
text

<--Header Title-->
I am info for <--Header Title-->
<--Header Title-->
This <--Header Title--> is finished
<--Header Title-->
<--Header Title--> contains the following:
stuff1
stuff2
stuff3`;


var reg = /<--([\w\s]+)-->([\s\S]*?)(?=\n<--|$)/g,
  m,
  res = { // object to store result
    data: []
  };

while (m = reg.exec(str)) {
  var data = {};
  data[m[1].trim()] = m[2].trim(); // storing data into object after formating
  res.data.push(data); // pushing object to array
}
document.write('<pre>' + JSON.stringify(res, null, 3) + '</pre>');

Regex explanation

Regular expression visualization

Upvotes: 1

Related Questions