Reputation: 1423
I have a large file of strings containing a lot of "tags" [[STRING]]. I've been trying to use notepad++ to extract these tags using the find and replace with regex enabled. So far all I've managed is to match the contents of [[STRING]].
\[\[([^]]+)\]\]
Can anyone provide me with regex for a search and replace that would leave me with just a list of [[STRING]]'s on new lines?
Thanks
Upvotes: 1
Views: 1147
Reputation:
Keep it simple
Find (?s).*?(?:(\[\[[^\[\]]+\]\])|$)
Replace $1\n
(?s)
.*?
(?:
( # (1 start)
\[\[
[^\[\]]+
\]\]
) # (1 end)
|
$
)
Yours is not as efficient as mine. – Wiktor Stribiżew
Regex1: (?s).*?(?:(\[\[[^\[\]]+\]\])|$)
Options: < none >
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 3
Elapsed Time: 0.29 s, 290.80 ms, 290799 µs
Regex2: (\[\[[^]]+]])|[^[]*(?:\[(?!\[)[^[]*)*
Options: < none >
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 5
Elapsed Time: 0.68 s, 677.31 ms, 677309 µs
This should work also: (?s).*?(\[\[[^\[\]]+\]\]|$)
. I think you don't have to put it into non-capturing group. – ccf
Works, but makes no difference.
Regex1: (?s).*?(?:(\[\[[^\[\]]+\]\])|$)
Options: < none >
Completed iterations: 100 / 100 ( x 1000 )
Matches found per iteration: 3
Elapsed Time: 0.58 s, 580.74 ms, 580737 µs
Regex2: (?s).*?(\[\[[^\[\]]+\]\]|$)
Options: < none >
Completed iterations: 100 / 100 ( x 1000 )
Matches found per iteration: 3
Elapsed Time: 0.59 s, 589.32 ms, 589323 µs
Upvotes: 2
Reputation: 626794
You can use an alternation of your pattern with a negated version:
(\[\[[^]]+]])|(?:(?!\[\[[^]]+]]).)+
^^^^^^^^^^^ ^^^^^^^^^^^
And replace with $1\n
. See the regex demo. The .
matches newline should be enabled. If the performance is not great with this one, use an unrolled version:
(\[\[[^]]+]])|[^[]*(?:\[(?!\[)[^[]*)*
See the regex demo
The (?:(?!\[\[[^]]+]]).)+
is a tempered greedy token that is working as a negated character class, but with sequences of characters (matches any text other than "abc").
Then, just remove all empty lines (Edit -> Line Operations -> Remove Empty Lines).
Well, you could also use a simpler regex like (\[\[[^]]+]])|.
to replace with $1\n
, but it would add too many linebreaks. Actually, that should not be a problem as you can later remove all the empty lines. Just use whatever works best for you.
Upvotes: 2