Cotten
Cotten

Reputation: 9077

multi-line xml regex in sublime

I have a large logfile (+100 000 lines) in XML like so:

<container>
   <request:getApples xml="...">
     ...
   </request:getApples>
   <request:getOranges xml="...">
     ...
   </request:getOranges>
</container>
...

I want to extract the :getXXXX part to

getApples
getOranges

by doing a regex find & replace in Sublime Text 2.

Something like

Find:      [^(request:)]*(.*) xml
Replace:   $1\n

Any regex masters that can assist?

Upvotes: 3

Views: 3002

Answers (4)

Haji Rahmatullah
Haji Rahmatullah

Reputation: 430

Try this

Find what: :(\w+)>|.\s?

Replace with: $1

And if didn't work as intended, then let me know?

Upvotes: 0

MattDMo
MattDMo

Reputation: 102892

Correcting mart1n's answer and actually using ST2 and your sample input, I came up with the following:

First, CtrlA to select all. Then, CtrlH,

Search: .*?(get\w+) .*
Replace: $1

Replace All

Then,

Search: ^[^get].*$
Replace: nothing

Replace All

Finally,

Search: ^\n
Replace: nothing

Replace All

And you're left with:

getApples
getOranges

Upvotes: 1

sharktamer
sharktamer

Reputation: 126

If you're willing to take the problem out of sublime text, you can use the dotall flag along with lazy matching to extract only the getXXX parts.

Replacing

.*?(get\w*) .*?

with

$1\n

should get you most of the way, only leaving a bit of easily removeable closing tags at the end of the file that I can't figure out at present.

You can check this solution here.

Maybe someone could take this and figure out a way to remove the extra closing tags.

Upvotes: 0

mart1n
mart1n

Reputation: 6213

Not familiar with Sublime Text but you can do in two parts:

  • Find .*?\(get\w+\) .* and replace with \1. Now those get* strings are on separate lines with nothing else. All that remains is to remove the cruft.

  • So, many ways to do this. Easy one: find ^[^g][^e][^t].*$ and replace with nothing (an empty string).

Now you have your file that contains just the string you want and some empty lines, which (I hope) Sublime can get rid of with some delete-empty-lines function.

You can quickly throw all of the above in a macro and execute at will for any input following the same format ;-)

Upvotes: 0

Related Questions