Reputation: 63
I'm working in FrameMaker and trying to extract definitions from a document glossary with a script. I've run into a problem with my lookahead assertion that I can't seem to sort. Glossary entries look like:
ADC........... air data computer
The problem is that each entry may have one or two tabs separating the acronym from the definition. The first tab is rendered as "......". Some glossaries have a second tab that appears as a blank space after the periods and before the definition. The following works fine for glossaries with a single tab.
(?<=\bADC\x08).*
However, if glossary uses two tabs, the regexp picks up the second tab along with the definition. If I change my look ahead to:
(?<=\bADC\x08\x08).*
It works with two tabs, but not with one. If I change it to:
(?<=\bADC\x08+).*
...which should find one or more occurrences of the tab character, I get a "Not Found" error. Apparently operators do not work the same way in assertions as they work in regexps.
Upvotes: 1
Views: 27
Reputation: 627292
Since you can use a capturing group in the regex to grab just a part of the match, you can use
\bADC\x08+(.*)
Details:
\b
- a word boundaryADC
- ADC
string\x08+
- one or more chars with 08
hex character(.*)
- Group 1: any zero or more chars other than line break chars as many as possible.See the regex demo.
Upvotes: 0