jeff_dude
jeff_dude

Reputation: 13

Notepad++ RegEx on XML File

First, I am new to regular expressions.

I have an XML File that is formatted as so:

<SCHED_TABLE LAST_UPLOAD="" ... TABLE_NAME="">
<JOB
  APPLICATION=""
  ...
  NODEID="foo"
  ...
>
</JOB>
<JOB
  APPLICATION=""
  ...
  NODEID="bar"
  ...
>
</JOB>
</SCHED_TABLE>

With a whole lot of lines in between. What I need to do is write a regular expression to find all the JOB's where NODEID != foo. Those that do NOT equal foo will be replaced with a blank, therefore deleting those jobs. The whole job needs to be deleted including the open and close JOB tag.

Any advice for this?

Upvotes: 1

Views: 2844

Answers (1)

elixenide
elixenide

Reputation: 44831

Try this (untested):

Find:

<JOB[^>]+?NODEID="(?!foo)[^>]+?>.+?</JOB>

Replace with blank.

Make sure . matches newline (or whatever that option is called -- I'm away from my normal computer) is checked.

Breakdown of how this works:

  • <JOB matches the start of a JOB tag.
  • [^>]+? matches everything that is not a > symbol, but the ? means "don't be greedy" -- that is, don't use more characters than you need to.
  • NODEID=" means match literally those characters.
  • (?!foo) is a negative look-ahead pattern. It means, "This is not a match if everything has worked so far but the text following this point is foo."
  • [^>]+?, again, all non-> characters, but not greedy.
  • > match the > character exactly.
  • .+? Match any string of characters, but don't be greedy (i.e., stop when you hit the last part of the regex, </JOB>)
  • </JOB> closing tag.

Ordinarily, .+? would not match newline characters, which is why you need to tell Notepad++ to let it do so.

Upvotes: 3

Related Questions