user3155956
user3155956

Reputation: 33

Javascript regular expression - multiple line

I have a file with following text structure and would like to parse date inside into an array ...

21/5/12 14:23:36: A: XXXX
YYY
ZZZ

21/5/12 14:23:25: B: XXX ZZZ YYY

21/5/12 14:23:25: B: XXX ZZZ YYY

I am using data.match(/[^\r\n]+\d+.*/g) to parse data from file and the result is

arr[0], 21/5/12 14:23:36: A: XXXX
arr[1], 21/5/12 14:23:25: B: XXX ZZZ YYY
arr[2], 21/5/12 14:23:25: B: XXX ZZZ YYY

Some text of the first item has been removed which is not desired.

Is it possible to use regular expression to parse the text like this?

Upvotes: 3

Views: 292

Answers (5)

Tafari
Tafari

Reputation: 3069

You could also try using following modification of your regex:

PATTERN

/[^\r\n]+\d+[a-zA-Z:\s]+/g

You are using .* which means any character except new line (except for the case when dotall flag is on), since you were not using this flag it doesn't capture multiple lines, but if you put it on it wall capture the whole string as one match, which is not desired. Here is the sample input and output produced by the modification I have provided:

INPUT

21/5/12 14:23:36: A: XXXX
YYY
ZZZ

21/5/12 14:23:25: B: XXX ZZZ YYY

21/5/12 14:23:25: B: XXX ZZZ YYY

OUTPUT

Match 1:

21/5/12 14:23:36: A: XXXX
YYY
ZZZ

Match 2:

21/5/12 14:23:25: B: XXX ZZZ YYY

Match 3:

21/5/12 14:23:25: B: XXX ZZZ YYY

I'm not necessarily sure that I get your intentions right, if you do not want the line breakes, in the first match you could probably remove them via some javascript string function as you still get the whole match as one string, unfortunately I do not know javascript.

Upvotes: 0

user1636522
user1636522

Reputation:

You can do it with a single regular expression, however, regarding your data source, the first result will still have linefeeds between "XXXX", "YYY" and "ZZZ" :

var arr = data.split(/[\n\s]+(?=\d\d?\/\d\d?\/\d\d)/);

Translation : "cut on linefeeds and spaces that are followed by a date".

In case you want to remove these extra linefeeds, you could replace them before splitting :

var arr = data.replace(/[\s\n]+(?!\d\d?\/\d\d?\/\d\d)/g, ' ').split(/\s*\n/);

Translation : "replace linefeeds and spaces that are not followed by a date with a single space, then cut on the remaining linefeeds, including preceding spaces".

Upvotes: 0

Jerry
Jerry

Reputation: 71538

You could perhaps try to parse the text a bit more strictly? I'm suggesting something like this:

/\d+\/\d+\/\d+\s+\d+:\d+:\d+:[^\r\n]+(?:[\s\S]+?(?=\s^\d+\/))?/g

\d+\/\d+\/\d+\s+\d+:\d+:\d+: should be quite easy to understand as it's quite literal.

[^\r\n]+ is to match everything remaining on the same line.

(?:[\s\S]+?(?=\s^\d+\/))? is to match any lines following until the next line that starts with a digit followed by a forward slash (indicating a date).

And using .replace instead, with a function containing a second replace to clean up (you can also match and then loop through the matches to remove the newlines).

var results = text.replace(/\d+\/\d+\/\d+\s+\d+:\d+:\d+:[^\r\n]+(?:[\s\S]+?(?=\s^\d+\/))?/g, function(m) {
    return m.replace(/\s+/g, " ");
});

Output:

21/5/12 14:23:36: A: XXXX YYY ZZZ 
21/5/12 14:23:25: B: XXX ZZZ YYY 
21/5/12 14:23:25: B: XXX ZZZ YYY

JSFiddle demo

Upvotes: 0

ntalbs
ntalbs

Reputation: 29448

I'm not sure the exact requirement. But if there's empty line between each data item, you can do it like this:

var data ="21/5/12 14:23:36: A: XXXX\r\nYYY\nZZZ\r\n\r\n21/5/12 14:23:25: B: XXX ZZZ YYY\r\n\r\n21/5/12 14:23:25: B: XXX ZZZ YYY";
data.split(/\r\n\r\n/);

Result of this code is:

["21/5/12 14:23:36: A: XXXX
YYY
ZZZ", "21/5/12 14:23:25: B: XXX ZZZ YYY", "21/5/12 14:23:25: B: XXX ZZZ YYY"]

Upvotes: 1

Ruben Kazumov
Ruben Kazumov

Reputation: 3872

If you need extract the date portions of the text:

data.match(/\d{2}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2}/g)

It will produce the result:

arr[0], 21/5/12 14:23:36
arr[1], 21/5/12 14:23:25
arr[2], 21/5/12 14:23:25

Upvotes: 0

Related Questions