Reputation: 33
I have a file with following text structure and would like to parse date inside into an array ...
21/5/12 14:23:36: A: XXXX
YYY
ZZZ
21/5/12 14:23:25: B: XXX ZZZ YYY
21/5/12 14:23:25: B: XXX ZZZ YYY
I am using data.match(/[^\r\n]+\d+.*/g)
to parse data from file and the result is
arr[0], 21/5/12 14:23:36: A: XXXX
arr[1], 21/5/12 14:23:25: B: XXX ZZZ YYY
arr[2], 21/5/12 14:23:25: B: XXX ZZZ YYY
Some text of the first item has been removed which is not desired.
Is it possible to use regular expression to parse the text like this?
Upvotes: 3
Views: 292
Reputation: 3069
You could also try using following modification of your regex:
PATTERN
/[^\r\n]+\d+[a-zA-Z:\s]+/g
You are using .*
which means any character except new line (except for the case when dotall flag is on), since you were not using this flag it doesn't capture multiple lines, but if you put it on it wall capture the whole string as one match, which is not desired. Here is the sample input and output produced by the modification I have provided:
INPUT
21/5/12 14:23:36: A: XXXX
YYY
ZZZ
21/5/12 14:23:25: B: XXX ZZZ YYY
21/5/12 14:23:25: B: XXX ZZZ YYY
OUTPUT
Match 1:
21/5/12 14:23:36: A: XXXX
YYY
ZZZ
Match 2:
21/5/12 14:23:25: B: XXX ZZZ YYY
Match 3:
21/5/12 14:23:25: B: XXX ZZZ YYY
I'm not necessarily sure that I get your intentions right, if you do not want the line breakes, in the first match you could probably remove them via some javascript string
function as you still get the whole match as one string, unfortunately I do not know javascript.
Upvotes: 0
Reputation:
You can do it with a single regular expression, however, regarding your data source, the first result will still have linefeeds between "XXXX", "YYY" and "ZZZ" :
var arr = data.split(/[\n\s]+(?=\d\d?\/\d\d?\/\d\d)/);
Translation : "cut on linefeeds and spaces that are followed by a date".
In case you want to remove these extra linefeeds, you could replace them before splitting :
var arr = data.replace(/[\s\n]+(?!\d\d?\/\d\d?\/\d\d)/g, ' ').split(/\s*\n/);
Translation : "replace linefeeds and spaces that are not followed by a date with a single space, then cut on the remaining linefeeds, including preceding spaces".
Upvotes: 0
Reputation: 71538
You could perhaps try to parse the text a bit more strictly? I'm suggesting something like this:
/\d+\/\d+\/\d+\s+\d+:\d+:\d+:[^\r\n]+(?:[\s\S]+?(?=\s^\d+\/))?/g
\d+\/\d+\/\d+\s+\d+:\d+:\d+:
should be quite easy to understand as it's quite literal.
[^\r\n]+
is to match everything remaining on the same line.
(?:[\s\S]+?(?=\s^\d+\/))?
is to match any lines following until the next line that starts with a digit followed by a forward slash (indicating a date).
And using .replace
instead, with a function containing a second replace to clean up (you can also match and then loop through the matches to remove the newlines).
var results = text.replace(/\d+\/\d+\/\d+\s+\d+:\d+:\d+:[^\r\n]+(?:[\s\S]+?(?=\s^\d+\/))?/g, function(m) {
return m.replace(/\s+/g, " ");
});
Output:
21/5/12 14:23:36: A: XXXX YYY ZZZ
21/5/12 14:23:25: B: XXX ZZZ YYY
21/5/12 14:23:25: B: XXX ZZZ YYY
Upvotes: 0
Reputation: 29448
I'm not sure the exact requirement. But if there's empty line between each data item, you can do it like this:
var data ="21/5/12 14:23:36: A: XXXX\r\nYYY\nZZZ\r\n\r\n21/5/12 14:23:25: B: XXX ZZZ YYY\r\n\r\n21/5/12 14:23:25: B: XXX ZZZ YYY";
data.split(/\r\n\r\n/);
Result of this code is:
["21/5/12 14:23:36: A: XXXX
YYY
ZZZ", "21/5/12 14:23:25: B: XXX ZZZ YYY", "21/5/12 14:23:25: B: XXX ZZZ YYY"]
Upvotes: 1
Reputation: 3872
If you need extract the date portions of the text:
data.match(/\d{2}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2}/g)
It will produce the result:
arr[0], 21/5/12 14:23:36
arr[1], 21/5/12 14:23:25
arr[2], 21/5/12 14:23:25
Upvotes: 0