Reputation: 13
First, I am new to regular expressions.
I have an XML File that is formatted as so:
<SCHED_TABLE LAST_UPLOAD="" ... TABLE_NAME="">
<JOB
APPLICATION=""
...
NODEID="foo"
...
>
</JOB>
<JOB
APPLICATION=""
...
NODEID="bar"
...
>
</JOB>
</SCHED_TABLE>
With a whole lot of lines in between. What I need to do is write a regular expression to find all the JOB's where NODEID != foo. Those that do NOT equal foo will be replaced with a blank, therefore deleting those jobs. The whole job needs to be deleted including the open and close JOB tag.
Any advice for this?
Upvotes: 1
Views: 2844
Reputation: 44831
Try this (untested):
Find:
<JOB[^>]+?NODEID="(?!foo)[^>]+?>.+?</JOB>
Replace with blank.
Make sure . matches newline
(or whatever that option is called -- I'm away from my normal computer) is checked.
Breakdown of how this works:
<JOB
matches the start of a JOB tag.[^>]+?
matches everything that is not a >
symbol, but the ?
means "don't be greedy" -- that is, don't use more characters than you need to.NODEID="
means match literally those characters.(?!foo)
is a negative look-ahead pattern. It means, "This is not a match if everything has worked so far but the text following this point is foo
."[^>]+?
, again, all non->
characters, but not greedy.>
match the >
character exactly..+?
Match any string of characters, but don't be greedy (i.e., stop when you hit the last part of the regex, </JOB>
)</JOB>
closing tag.Ordinarily, .+?
would not match newline characters, which is why you need to tell Notepad++ to let it do so.
Upvotes: 3