Reputation: 151
I've got a pdf file turned into a huge string of over 1,000,000 characters. There are dates in the string in the format dd/mm/yyyy
. I want to split the string by dates into smaller ones. I tried following:
var sectioned = hugeString.split(/^(0?[1-9]|[12][0-9]|3[01])[\/](0?[1-9]|1[012])[\/\-]\d{4}$/g);
But it's not working. I also tried hugeString.match()
, but no good result there.
Is it even possible to accomplish this by string functions or should I think of a different approach?
String snippet:
....Section: 2 Interpretation E.R. 2 of 2012 02/08/2012 .....
Upvotes: 1
Views: 207
Reputation: 626845
You may remove anchors, g
modifier (it is redundant) and use non-capturing groups to avoid dates being output as well in the results. Wrap in (?=PATTERN HERE)
if you need to split keeping the dates in the split chunks. However, if you prefer this approach, please make sure there are no optional 0
s in the pattern at the beginning, or you might get redundant elements in the result.
var s = "....Section: 2 Interpretation E.R. 2 of 2012 02/08/2012 ..... ";
var res = s.split(/(?:0?[1-9]|[12][0-9]|3[01])[\/-](?:0?[1-9]|1[012])[\/-]\d{4}/);
console.log(res);
res = s.split(/(?=(?:0[1-9]|[12][0-9]|3[01])[\/-](?:0[1-9]|1[012])[\/-]\d{4})/);
console.log(res);
Note you also had a [\/]
subpattern without -
in the pattern while the other separator character class contained both chars. I suggest using [\/-]
in both cases.
Upvotes: 1