DisplayedName
DisplayedName

Reputation: 97

Regular expressions in python, find something dependant on count of new lines

I have string in following format:

 * some stuff before

Time = 23.96480001

* some stuff that vary - can be 10, can be 100 lines of some stuff

ExecutionTime = 399500.83 s

* some stuff after

Time = 23.96480016

* repeat

Where my regex kinda work:

rgx = "\nTime = (?P<time>\d+\.\d+)(\n.*){5,810}ExecutionTime = (?P<exec>\d+\.\d+)"

But doesn't get everything.

When I get too big or too low number in brackets {,810} it never finds everything.

Is there some way to upgrade this regex not to use x-numbers of new lines but use "until it find ExecutionTime = ... "

Thank you very much to everyone.

Upvotes: 1

Views: 49

Answers (1)

The fourth bird
The fourth bird

Reputation: 163207

You could use a negative lookahead to match all following lines that do not start with either ExecutionTime = or Time = so you do not have to specify a quantifier with a fixed range.

^Time = (?P<time>\d+\.\d+)((?:\n(?!(?:Execution)?Time =).*)*)\nExecutionTime = (?P<exec>\d+\.\d+)

Explanation

  • ^ Start of string
  • Time = Match literally
  • (?P<time>\d+\.\d+) Named group time to match 1+ digits, a dot and 1+ digits
  • ( Capture group 1 to capture what is between Time and ExecutionTime
    • (?: Non capture group
      • \n(?! Match a newline, and assert using a negative lookahead what is to the right is not
        • (?:Execution)?Time = Match optional Execution and Time =
        • .* If the assertion is true, match the whole line
      • ) Close lookahead
    • )* Close the non capture group and optionally repeat it
  • ) Close capture group 1
  • \nExecutionTime = Match literally
  • (?P<exec>\d+\.\d+) Named group exec to match 1+ digits, a dot and 1+ digits

Regex demo

Upvotes: 1

Related Questions