Reputation: 8178
I'd like to grab the various sections in my code with Regular expressions. I want to write four different regex expressions. The first one is simple, which is to grab the first line that begins with the word extends
. The next three need to grab the sections denoted block head
, block body
, and block scripts
.
I'm a bit lost. So far I've got /^block/m
I'm not looking to respect indentation, just using it for my own visual organization.
extends standard
block head
<title>title</title>
<meta name="description" content="A wonderful thing.">
block body
<h1>Title</h1>
<p>A wonderful paragraph...</p>
block scripts
<script src="/javascritps/html5shiv.js"></script>
I need to be able to grab the identifier after the word block.
Also, separately, I need to grab the HTML content after each block ____ statement.
Upvotes: 3
Views: 360
Reputation: 5556
You have a good start: here is how to do using lookbehind: /(?<=^block )\w+\n/mg
See it in action here: https://regex101.com/r/bFhNSO/1
[EDIT] for explanations.
Using a lookbehind is more complex syntax but allows you to only capture the word you need, without the word "Block
".
Still if you don't care, or if you do it on JS you can do the same with:
/^block (\w+)\n/mg
then you need to capture.
[EDIT] After question changes.
So for JS with no lookbehind and grabbing also the html all in one regex, you can use something like this: /block (\w+)\n+([\s\S]*?)(?=\s+\nblock|$)/g
.
See it working here: https://regex101.com/r/bFhNSO/2.
Note that I changed the flavor to js in regex101.
[EDIT] add more details.
g
is for global so you can match multiple instances of
the same pattern.(\w+)
captures a word basically its like [a-z_]+ so you may want to change it to more permissive according to your needs.([\s\S]*?)
captures anything, so it is like .* that you usually see, but particularly in JS you don't have the s
flag for matching any spacing char with .
so the longhand equivalent is [\s\S]+
, matching any \s
AND any NOT \s
with \S
. The ?
is for greediness, meaning you want to take the smallest match possible, you can try the regex without and you will understand the difference.(?=\s+\nblock|$)
is a lookahead, allowed in JS, to make sure your previous match is followed by either the word block
or the end of document with $
.That's it, hope it helps people! :)
Upvotes: 4