Peck3277
Peck3277

Reputation: 1423

Regex, select all not between square brackets

I have a large file of strings containing a lot of "tags" [[STRING]]. I've been trying to use notepad++ to extract these tags using the find and replace with regex enabled. So far all I've managed is to match the contents of [[STRING]].

\[\[([^]]+)\]\]

Can anyone provide me with regex for a search and replace that would leave me with just a list of [[STRING]]'s on new lines?

Thanks

Upvotes: 1

Views: 1147

Answers (2)

user557597
user557597

Reputation:

Keep it simple
Find (?s).*?(?:(\[\[[^\[\]]+\]\])|$)
Replace $1\n

 (?s)
 .*? 
 (?:
      (                             # (1 start)
           \[\[
           [^\[\]]+ 
           \]\]
      )                             # (1 end)
   |  
      $ 
 )

Yours is not as efficient as mine. – Wiktor Stribiżew

Regex1:   (?s).*?(?:(\[\[[^\[\]]+\]\])|$)
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   3
Elapsed Time:    0.29 s,   290.80 ms,   290799 µs


Regex2:   (\[\[[^]]+]])|[^[]*(?:\[(?!\[)[^[]*)*
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   5
Elapsed Time:    0.68 s,   677.31 ms,   677309 µs

This should work also: (?s).*?(\[\[[^\[\]]+\]\]|$). I think you don't have to put it into non-capturing group. – ccf

Works, but makes no difference.

Regex1:   (?s).*?(?:(\[\[[^\[\]]+\]\])|$)
Options:  < none >
Completed iterations:   100  /  100     ( x 1000 )
Matches found per iteration:   3
Elapsed Time:    0.58 s,   580.74 ms,   580737 µs


Regex2:   (?s).*?(\[\[[^\[\]]+\]\]|$)
Options:  < none >
Completed iterations:   100  /  100     ( x 1000 )
Matches found per iteration:   3
Elapsed Time:    0.59 s,   589.32 ms,   589323 µs

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626794

You can use an alternation of your pattern with a negated version:

(\[\[[^]]+]])|(?:(?!\[\[[^]]+]]).)+
 ^^^^^^^^^^^        ^^^^^^^^^^^

And replace with $1\n. See the regex demo. The . matches newline should be enabled. If the performance is not great with this one, use an unrolled version:

(\[\[[^]]+]])|[^[]*(?:\[(?!\[)[^[]*)*

See the regex demo

The (?:(?!\[\[[^]]+]]).)+ is a tempered greedy token that is working as a negated character class, but with sequences of characters (matches any text other than "abc").

enter image description here

Then, just remove all empty lines (Edit -> Line Operations -> Remove Empty Lines).

Well, you could also use a simpler regex like (\[\[[^]]+]])|. to replace with $1\n, but it would add too many linebreaks. Actually, that should not be a problem as you can later remove all the empty lines. Just use whatever works best for you.

Upvotes: 2

Related Questions