Reputation: 67
I have following text file:
#Beginning of the file
aaa
bbbb
ccc
dddd
eee
ffff
ggggg
hhhsasas
dsdsd
#end of file
How to match part of this file starting from the empty line on the beginning of the every section with text from this section to the end of text section (without new line between next section)? So, using above example I would like to get 3 matches:
#Beginning of the first match
aaa
bbbb
ccc
#End of first match
#Beginning of the second match
dddd
eee
ffff
#End of second match
#Beginning of the third match
ggggg
hhhsasas
dsdsd
#End of third match
I've tried something like this:
(\n\n)[^(\n\n)]*
but it doesn't work as I want, because ^(\n\n) is not treated as group but separate sign, thus it matches end of the first line.
Upvotes: 2
Views: 6965
Reputation: 424983
Split on a look-ahead for a blank line:
String[] sets = input.split("(?m)(?=$\\s^$)");
Using the "multi line" regex switch (?m)
makes ^ and $ match start/end of lines, and by using \s
to match newlines means this will work on both unix, mac and windows files.
This preserves the blank lines, but if you just want the lines, change the regex to remove thr look ahead (?m)$\\s^$
Upvotes: 0
Reputation: 70923
One new line + ( characters not new line + new line ) repeat
/\n(?:[^\n]+\n)+/
Upvotes: 0
Reputation: 14334
Using positive lookahead:
.+?(?=^$)
This requires the 'dot matches new line' and '^$ match at line breaks' switches.
You can activate 'dot matches new line' in the expression...
(?s).+?(?=^$)
...but alas, python only allows one mode modifier so you will need to code in the ^$ switch.
Input:
aaa
bbbb
ccc
dddd
eee
ffff
ggggg
hhhsasas
dsdsd
Results:
Match 1:
aaa
bbbb
ccc
0 18
Match 2:
dddd
eee
ffff
18 19
Match 3:
ggggg
hhhsasas
dsdsd
37 26
EDIT
Here is the whole lot with no switches. Note the optional carriage return for platform independence. The final empty line is also optional:
(.+\r?\n)+(?=(\r?\n)?)
Upvotes: 3
Reputation: 21
Tested on regexr, the following seems to yield the right results. I am capturing the first empty line, as in your example. Note the use of dotall (/s) switch to allow a whole block to be picked, and a lazy match (+?) to stop it from running all the way to the end.
/\b.+?(?=\r\r)/gs
I expect you would need to use the correct new-line character depending on your environment.
Upvotes: 0