Reputation: 579
I have a text file to process, with some example content as follows:
[FCT-FCTVALUEXXXX-IA] Name=value Label = value Zero or more lines of text Abbr= Zero or more lines of text Field A=1 Field B=0 Zero or more lines of text Hidden=N [Text-FCT-FCTVALUEXXXX-IA-Note] One or more note lines [FCT-FCT-FCTVALUEZ-IE-DETAIL] Zero or more lines of text [FCT-FCT-FCTVALUEQ-IA-DETAIL] Zero or more lines of text [FCT-_FCTVALUEY-IA] Name=value Zero or more lines of text Label=value Zero or more lines of text Field A=1 Abbr=value Field A=1 Zero or more lines of text Hidden=N
I need to find sections like this:
[FCT-FCTVALUEXXXX-IA] Name=value Label = value Zero or more lines of text Abbr= Zero or more lines of text Field A=1 Field B=0 Zero or more lines of text Hidden=N
and extract FCT-FCTVALUEXXXX-AA, Name, Label, Abbr, Field A and B and Hidden, and then find a corresponding section (if it exists):
[Text-FCT-FCTVALUEXXXX-IA-Note] One or more note lines
end extract the Note lines as a single string.
I don't care about the sections
[FCT-FCT-FCTVALUEZ-IE-DETAIL] Zero or more lines of text
All three sorts of sections can appear anywhere in the file, including right at the end, and there's no predictable relationship in position between the sections.
The order of Abbr and Fields A and B cannot be guaranteed but they always appear after Name and Label and before Hidden.
What I have so far:
strParse = "(%[FCT%-.-%-)([IF])([EA])%]%c+Name=(.-)%c.-Label=(.-)%c(.-)Hidden=(%a)%c" --cant pull everything out at once because the order of some fields is not predictable
for id, rt, ft, name, label, detail, hidden in strFacts:gmatch(strParse) do
--extract details
abbr=detail:match("Abbr=(.-)%c") --may be blank
if abbr == nil then abbr = "" end
FieldA = detail:match("Field A=(%d)")
FieldB = detail:match("Field B=(%d)")
--need to sanitise id which could have a bunch of extraneous material tacked on the front and use it to get the Note
ident=id:match(".*(%[FCT%-.-%-$)")..rt..ft
Note = ParseAutonote(ident) --this is a function to parse the note which I've yet to test so a dummy function returns ""
tblResults[name]={ident, rt, ft, name, label, abbr, FieldA, FieldB, hidden, note}
end
Most of it works OK (after many hours of working on it), but the piece that isn't working is:
(".*(%[FCT%-.-%-$)")
which is supposed to pull out the final occurrence of FCT-sometext- in the string id
My logic: anchor the search to the end of the string and capture the shortest possible string beginning with "[FCT-" and ending with "-" at the end of the string.
Given a value of either "[FCT-_ABCD-PDQR-" or "[FCT-XYZ-DETAIL]lines of text[FCT-_ABCD-PDQR-" it returns nil when I want it to return "FCT-_ABCD-PDQR-". (Note ABCD, PDQR etc can be any length of text containing Alpha, - and _).
Upvotes: 2
Views: 196
Reputation: 5021
As you discovered yourself (".*(%[FCT%-.-%-)$")
works the way you want,
where (".*(%[FCT%-.-%-$)")
does not. $
and ^
are anchors and must come at the end or beginning of the pattern, they can not appear inside a capture closure.
When the anchor characters appear anywhere else in the pattern they will be part of the string you are looking for, excluding cases where ^
is used in a set to exclude chars i.e.: excluding upper-case chars [^A-Z]
Here are examples of the pattern matching using the an example string and the pattern from your question.
print(string.match("[FCT-_ABCD-PDQR-", (".*(%[FCT%-.-%-$)"))) -- initial pattern
> nil
print(string.match("[FCT-_ABCD-PDQR-$", (".*(%[FCT%-.-%-$)"))) -- $ added to end of string
> [FCT-_ABCD-PDQR-$
print(string.match("[FCT-_ABCD-PDQR-", (".*(%[FCT%-.-%-)$"))) -- $ moved to end of pattern
> [FCT-_ABCD-PDQR-
Upvotes: 1