Tom
Tom

Reputation: 67

Regex - how not to match two newlines

I have following text file:

#Beginning of the file

aaa
bbbb
ccc

dddd
eee
ffff

ggggg
hhhsasas
dsdsd

#end of file

How to match part of this file starting from the empty line on the beginning of the every section with text from this section to the end of text section (without new line between next section)? So, using above example I would like to get 3 matches:

#Beginning of the first match    

aaa
bbbb
ccc
#End of first match
#Beginning of the second match

dddd
eee
ffff
#End of second match
#Beginning of the third match

ggggg
hhhsasas
dsdsd
#End of third match

I've tried something like this:

(\n\n)[^(\n\n)]*

but it doesn't work as I want, because ^(\n\n) is not treated as group but separate sign, thus it matches end of the first line.

Upvotes: 2

Views: 6965

Answers (4)

Bohemian
Bohemian

Reputation: 424983

Split on a look-ahead for a blank line:

String[] sets = input.split("(?m)(?=$\\s^$)");

Using the "multi line" regex switch (?m) makes ^ and $ match start/end of lines, and by using \s to match newlines means this will work on both unix, mac and windows files.

This preserves the blank lines, but if you just want the lines, change the regex to remove thr look ahead (?m)$\\s^$

Upvotes: 0

MC ND
MC ND

Reputation: 70923

One new line + ( characters not new line + new line ) repeat

/\n(?:[^\n]+\n)+/

Upvotes: 0

Gusdor
Gusdor

Reputation: 14334

Using positive lookahead:

.+?(?=^$)

This requires the 'dot matches new line' and '^$ match at line breaks' switches.

You can activate 'dot matches new line' in the expression...

(?s).+?(?=^$)

...but alas, python only allows one mode modifier so you will need to code in the ^$ switch.

Input:

aaa
bbbb
ccc

dddd
eee
ffff

ggggg
hhhsasas
dsdsd

Results:

Match 1:    
aaa
bbbb
ccc
         0      18
Match 2:    
dddd
eee
ffff
        18      19
Match 3:    
ggggg
hhhsasas
dsdsd
        37      26

EDIT

Here is the whole lot with no switches. Note the optional carriage return for platform independence. The final empty line is also optional:

(.+\r?\n)+(?=(\r?\n)?)

Upvotes: 3

AnkurTG
AnkurTG

Reputation: 21

Tested on regexr, the following seems to yield the right results. I am capturing the first empty line, as in your example. Note the use of dotall (/s) switch to allow a whole block to be picked, and a lazy match (+?) to stop it from running all the way to the end.

/\b.+?(?=\r\r)/gs

I expect you would need to use the correct new-line character depending on your environment.

Upvotes: 0

Related Questions