Reputation: 6050
I'm trying to capture a block of text from a linux command that's delimited by a blank line. However, I'm having trouble with the regex matching past the first line.
So the output might look like this where the blocks are of variable length. What should the regex be to match past the first line and to match until the blank line?
This is my regex that I thought would logically match but it doesn't:
[0-9a-f]{2}:[0-9a-f]{2}.[0-9a-f].*^$
If I remove the ^$, it matches up to the end of the line but no more. If I add the ^$, it doesn't match to the blank line like I thought it would. All other regexes I tried also don't work, although I don't want to clog up this question with everything else I tried.
I'm writing a script in python but I'm testing the regex manually in the unix less command since it's easier to debug and perform trial and error.
00:00.4 Host bridge: Intel Corporation Ice Lake IEH
Subsystem: Intel Corporation Device 0000
Flags: fast devsel, NUMA node 0, IOMMU group 3
Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
00:01.0 System peripheral: Intel Corporation Ice Lake CBDMA [QuickData Technology]
Subsystem: Intel Corporation Device 0000
Flags: bus master, fast devsel, latency 0, IRQ 255, NUMA node 0, IOMMU group 4
Memory at 38fffff50000 (64-bit, non-prefetchable) [disabled] [size=16K]
Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [80] Power Management version 3
Capabilities: [ac] MSI-X: Enable- Count=1 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [1d0] Latency Tolerance Reporting
Kernel modules: ioatdma
With this sample output above, say I want to capture the line from 00:00.4 to the blank line, which is 4 lines of text, which is,
00:00.4 Host bridge: Intel Corporation Ice Lake IEH
Subsystem: Intel Corporation Device 0000
Flags: fast devsel, NUMA node 0, IOMMU group 3
Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
what would the regex need to be to do that?
Upvotes: 2
Views: 81
Reputation: 18480
To match any characters and non-empty consecutive lines after it:
[\da-f]{2}:[\da-f]{2}\.[\da-f].*(?:\n.+)*
See this demo at regex101 (without single line s
-flag)
Use with re.search
to just get one match (Python demo).
.
the dot matches any character (besides newline)(?:
non capturing group )
used for repetition\n
matches a newline*
any amount, +
one or moreNote that you need to escape the dot from its special meaning in regex to match a literal dot.
Further your current regex would indeed match in singleline-mode (dot matches newline) with multi-line flag to make the ^
caret and $
dollar match start and end of line. To stop at the first empty line you would need to use lazy .*?
instead of greedy .*
though. However, this would be considerable slower (less efficient) than using the pattern introduced above (less backtracking).
Here you would need to change ^$
to (?:^$|\Z)
(regex101 demo) if you also want to match the last item if there is no newline present at the end of the string (where \Z
will match the very end).
Upvotes: 3