Reputation: 93
I need to parse the following string (Parsing PDF, would like to avoid third party packages.).
/Type /Pages /MediaBox [0 0 612 792] /Count 9 /Kids [ 5 0 R 355 0 R ]
I am using Javascript:
String.split(' ');
The Output I would like to get is [ '/Type', '/Pages', '/MediaBox', '[0 0 612 792]', '/Count', '9', '/Kids', '[ 5 0 R 355 0 R]' ]
This results in: the following output: [ '<<', '/Type', '/Pages', '/MediaBox', '[0', '0', '612', '792]',
Specifically, I would like to delimit '[' and ']'. so that the string would read '[ 5, 0, R, 355, 0, R]'
The Final result expected is this:
I am trying to see if I can address this with regular expression and currently I am stuck.
Upvotes: 0
Views: 40
Reputation: 2555
This regex should take care of it
var input = "/Type /Pages /MediaBox [0 0 612 792] /Count 9 /Kids [ 5 0 R 355 0 R ]"
var result = input.match(/(\[[^\]]+\]|\S+)/g)
console.log(result)
as an explanation, it groups every character that is not ] between the characters [ and ] ([[^]]+]) OR a sequence of characters that is not a space (\S+)
Upvotes: 2
Reputation: 24955
You can use a regex that will return [...]
groups and then you can replace spaces
with comma
. Then, you just have to split it by spaces
var s = "/Type /Pages /MediaBox [0 0 612 792] /Count 9 /Kids [ 5 0 R 355 0 R ]";
var arr_reg = /\[(.*?)(?:\]|$)/g;
s = s.replace(arr_reg, function(str){
str = str.substring(1,str.length-1);
return "[" + str.trim().replace(/ /g, ',') + "]"
});
console.log(s.split(' '))
Upvotes: 1