Reputation: 9077
I have a large logfile (+100 000 lines) in XML like so:
<container>
<request:getApples xml="...">
...
</request:getApples>
<request:getOranges xml="...">
...
</request:getOranges>
</container>
...
I want to extract the :getXXXX
part to
getApples
getOranges
by doing a regex find & replace in Sublime Text 2.
Something like
Find: [^(request:)]*(.*) xml
Replace: $1\n
Any regex masters that can assist?
Upvotes: 3
Views: 3002
Reputation: 430
Try this
Find what: :(\w+)>|.\s?
Replace with: $1
And if didn't work as intended, then let me know?
Upvotes: 0
Reputation: 102892
Correcting mart1n's answer and actually using ST2 and your sample input, I came up with the following:
First, CtrlA to select all. Then, CtrlH,
Search: .*?(get\w+) .*
Replace: $1
Replace All
Then,
Search: ^[^get].*$
Replace: nothing
Replace All
Finally,
Search: ^\n
Replace: nothing
Replace All
And you're left with:
getApples
getOranges
Upvotes: 1
Reputation: 126
If you're willing to take the problem out of sublime text, you can use the dotall flag along with lazy matching to extract only the getXXX parts.
Replacing
.*?(get\w*) .*?
with
$1\n
should get you most of the way, only leaving a bit of easily removeable closing tags at the end of the file that I can't figure out at present.
You can check this solution here.
Maybe someone could take this and figure out a way to remove the extra closing tags.
Upvotes: 0
Reputation: 6213
Not familiar with Sublime Text but you can do in two parts:
Find .*?\(get\w+\) .*
and replace with \1
. Now those get* strings are on separate lines with nothing else. All that remains is to remove the cruft.
So, many ways to do this. Easy one: find ^[^g][^e][^t].*$
and replace with nothing (an empty string).
Now you have your file that contains just the string you want and some empty lines, which (I hope) Sublime can get rid of with some delete-empty-lines function.
You can quickly throw all of the above in a macro and execute at will for any input following the same format ;-)
Upvotes: 0