MagikarpSama
MagikarpSama

Reputation: 323

How to use a previously matched regular expression again in Python?

I use regular expressions in Python to analyze this kind of text:

#0
$dumpvars
0!
0"
0#
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 7
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 6
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 5
b0000000000000000 $
bxxxxxxxxxxxxxxxx /
bxxxxxxxxxxxxxxxx .
bxxxxxxxxxxxxxxxx )
b0111111111111111 %
bxxxxxxxxxxxxxxxx 1
bxxxxxxxxxxxxxxxx 0
bxxxxxxxxxxxxxxxx *
b10101010101010101010101010101010 &
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ,
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 2
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx -
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 3
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 4
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx (
bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx '
$end
#600
1!
b0000000000000000 )
b0111111111111111 *
b10101010101010101010101010101010 +
b0000000000000000 /
b0111111111111111 1
b00000000000000000000000000000000 5
b10101010101010101010101010101010 4
b00000000000000000000000000000000 2
b00000000000000000000000000000000 3
b010101010101010101010101010101010 7
#1200

Now I want to extract everything between two "#(number)" entries. This would be between #0 and #600 and also for #600 and #1200.

I already wrote the following regular expression for this:

(?s)(\#\d{1,})(.*?)(\#\d{1,})

There is a version of it with the text I want to match here: https://regex101.com/r/nH65Cw/6

But as you can see it completely ignores each 2 text block that I need.

How to include the excluded textblocks as well?

Upvotes: 4

Views: 190

Answers (2)

Pedro Lobito
Pedro Lobito

Reputation: 98961

You can use re.split with ^#[0-9]+, i.e.:

import re
result = re.split("^#[0-9]+", _string, 2147483647,  re.DOTALL | re.MULTILINE)
result = list(filter(None, result)) # removes empty matches

Live Demo

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163457

It matches the last part so it can not be part of the next match.

You could use a positive lookahead (?= for the last part:

(?s)(\#\d{1,})(.*?)(?=(\#\d{1,}))

Upvotes: 3

Related Questions