felixleo22
felixleo22

Reputation: 107

regex: get everything that doesn't fit in hooks

i try to get everything that doesn't fit in hooks with regex in OpenRefine but i'm stuck.

i have done this :

/^([a-z]+)\[[a-z]+\]([a-z]+)/

but I can't "repeat" my rule so that it applies in all these cases.

here are my test character strings :

abcd[zz]efgh[zz]ijkl[zz] 
# i want: abcd efgh ijkl

abcd[zz]efgh[zz]ijkl
# i want: abcd efgh ijkl

abcd[zz]efgh
# i want: abcd efgh

abcd[zz]
# i want: abcd

[zz]abcd
# i want: abcd

Thank you in advance

Upvotes: 1

Views: 65

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626903

You can extract strings that do not contain ] and [ that are not immediately followed with any chars other than square brackets and then a ] char:

(?=([^\]\[]+))\1(?![\]\[]*])

The trick is also to use an atomic first pattern so as to stop backtracking to return a part of a match. In JavaScript regex, the atomic pattern can be defined with a positive lookahead capturing a pattern, and then using a backreference to the matched text right after.

Details:

  • (?=([^\]\[]+)) - a positive lookahead that captures into Group 1 one or more chars other than [ and ]
  • \1 - the backreference to Group 1 that consumes the text captured into Group 1
  • (?![\]\[]*]) - a negative lookahead that fails the match if, immediately to the right, there are zero or more chars other than [ and ] and then a ].

Upvotes: 2

Related Questions