576i
576i

Reputation: 8372

Why does this Autohotkey RegExMatch fail when used on a text file but work from a variable?

I'm trying to find out why the RegExMatch command in the code below fails when I use it on a variable read from a file. It works when I assign the file content directly to a variable within AHK.

To test this, open notepad, copy the multiline content of TableCode2 into notepad and save it as c:\temp\testtable.txt

When I run the script, the first messagebox doesn't show a match. The second box matches. I tested this both on windows7 32bit and 64bit.

Any idea what the difference between both scenarios is and why I can't match against the file?

InputTable = c:\temp\testtable.txt
FileRead, TableCode, %InputTable%

TableCode2 =
(
OBJECT Table 50093 test
{
  OBJECT-PROPERTIES
  {
    Date=22.08.13;
    Time=10:47:20;
  }
  PROPERTIES
  {
  }
  FIELDS
  {
    { 1   ;   ;test                ;Text30         }
  }
  KEYS
  {
    {    ;test                                    ;Clustered=Yes }
  }
  CODE
  {

    BEGIN
    END.
  }
}
)

Needle := "FIELDS(.*)KEYS"
Foundpos := RegExMatch(TableCode,Needle,Out)
msgbox, %Needle%`n %Out1%`n--------------%TableCode%

Foundpos := RegExMatch(TableCode2,Needle,Out)
msgbox, %Needle%`n %Out1%`n--------------%TableCode%

Upvotes: 1

Views: 2071

Answers (1)

MCL
MCL

Reputation: 4085

The dot in AHK regex matches "any single character which is not part of a newline (`r`n) sequence". And that's the tricky part: The newlines of a Windows text file are `r`n by default, whereas the bracket notation inside the code consists of `n characters as newlines.
Consequentially, RegExMatch("FIELDS(.*)KEYS") will by default stop consuming when it encounters a `r`n. In your example regarding the file, this is immediately after FIELDS, disallowing the regex to ever become true.
The variable TableCode2 on the other hand contains not a single `r`n character, allowing the regex to reach KEYS.
There are many possible solutions like stripping each `r from the file, but the probably easiest and most consistent way would be to use the simple DotAll option. "This causes a period (.) to match all characters including newlines (normally, it does not match newlines)."
The resulting regex could look like this: s)FIELDS(.*)KEYS.

However, using regular expressions to parse something that is not a regular language, shouldn't be done. If you have control over the output, use a standardized format like XML or JSON. If you are parsing some kind of programming language, use the existing compiler/interpreter to parse it, and then convert it.

Upvotes: 2

Related Questions