Reputation: 8372
I'm trying to find out why the RegExMatch command in the code below fails when I use it on a variable read from a file. It works when I assign the file content directly to a variable within AHK.
To test this, open notepad, copy the multiline content of TableCode2 into notepad and save it as c:\temp\testtable.txt
When I run the script, the first messagebox doesn't show a match. The second box matches. I tested this both on windows7 32bit and 64bit.
Any idea what the difference between both scenarios is and why I can't match against the file?
InputTable = c:\temp\testtable.txt
FileRead, TableCode, %InputTable%
TableCode2 =
(
OBJECT Table 50093 test
{
OBJECT-PROPERTIES
{
Date=22.08.13;
Time=10:47:20;
}
PROPERTIES
{
}
FIELDS
{
{ 1 ; ;test ;Text30 }
}
KEYS
{
{ ;test ;Clustered=Yes }
}
CODE
{
BEGIN
END.
}
}
)
Needle := "FIELDS(.*)KEYS"
Foundpos := RegExMatch(TableCode,Needle,Out)
msgbox, %Needle%`n %Out1%`n--------------%TableCode%
Foundpos := RegExMatch(TableCode2,Needle,Out)
msgbox, %Needle%`n %Out1%`n--------------%TableCode%
Upvotes: 1
Views: 2071
Reputation: 4085
The dot in AHK regex matches "any single character which is not part of a newline (`r`n) sequence".
And that's the tricky part: The newlines of a Windows text file are `r`n by default, whereas the bracket notation inside the code consists of `n characters as newlines.
Consequentially, RegExMatch("FIELDS(.*)KEYS")
will by default stop consuming when it encounters a `r`n. In your example regarding the file, this is immediately after FIELDS
, disallowing the regex to ever become true.
The variable TableCode2
on the other hand contains not a single `r`n character, allowing the regex to reach KEYS
.
There are many possible solutions like stripping each `r from the file, but the probably easiest and most consistent way would be to use the simple DotAll option. "This causes a period (.) to match all characters including newlines (normally, it does not match newlines)."
The resulting regex could look like this: s)FIELDS(.*)KEYS
.
However, using regular expressions to parse something that is not a regular language, shouldn't be done. If you have control over the output, use a standardized format like XML or JSON. If you are parsing some kind of programming language, use the existing compiler/interpreter to parse it, and then convert it.
Upvotes: 2