Reputation: 323
I use regular expressions in Python to analyze this kind of text:
#0
$dumpvars
0!
0"
0#
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 7
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 6
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 5
b0000000000000000 $
bxxxxxxxxxxxxxxxx /
bxxxxxxxxxxxxxxxx .
bxxxxxxxxxxxxxxxx )
b0111111111111111 %
bxxxxxxxxxxxxxxxx 1
bxxxxxxxxxxxxxxxx 0
bxxxxxxxxxxxxxxxx *
b10101010101010101010101010101010 &
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ,
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 2
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx -
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 3
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 4
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx (
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx '
$end
#600
1!
b0000000000000000 )
b0111111111111111 *
b10101010101010101010101010101010 +
b0000000000000000 /
b0111111111111111 1
b00000000000000000000000000000000 5
b10101010101010101010101010101010 4
b00000000000000000000000000000000 2
b00000000000000000000000000000000 3
b010101010101010101010101010101010 7
#1200
Now I want to extract everything between two "#(number)" entries. This would be between #0 and #600 and also for #600 and #1200.
I already wrote the following regular expression for this:
(?s)(\#\d{1,})(.*?)(\#\d{1,})
There is a version of it with the text I want to match here: https://regex101.com/r/nH65Cw/6
But as you can see it completely ignores each 2 text block that I need.
How to include the excluded textblocks as well?
Upvotes: 4
Views: 190
Reputation: 98961
You can use re.split with ^#[0-9]+
, i.e.:
import re
result = re.split("^#[0-9]+", _string, 2147483647, re.DOTALL | re.MULTILINE)
result = list(filter(None, result)) # removes empty matches
Upvotes: 0
Reputation: 163457
It matches the last part so it can not be part of the next match.
You could use a positive lookahead (?=
for the last part:
(?s)(\#\d{1,})(.*?)(?=(\#\d{1,}))
Upvotes: 3