Reputation: 397
I have a string and I am trying to extract a particular section of from it using a LUA pattern match. I saved this as regex which you can see here, along with the string and regex syntax which extracts the exact section I want (the green capture group). I have converted this to the LUA equivelent pattern syntax which is:
result = {string.match(description, "Weapons.-\n(.*)\n\n")}
but it errors saying "pattern to complex". The weird thing though is that I tried to troubleshoot this as I assumed I made a mistake in the conversion, if I remove the last \n
it does work, but it captures the abilities section too, which is undesirable. I think my syntax is correct, because when I remove that \n
and also remove it from the regex, they both match the same data... so what goes wrong when I add two \n
s in LUA??
I have lots of differnt ways and I get some weird results so I am starting to think that this is kind of bug in LUA itself.
One extra thing I'd like to point out which may help, is that I am doing this in Tabletop Simulator which I believe uses Moonsharp (which is a LUA interpreter). Can anyone advise on what is going on here or how to tweak it to capture the data I want?
thanks,
Upvotes: 0
Views: 540
Reputation: 11201
I have lots of differnt ways and I get some weird results so I am starting to think that this is kind of bug in LUA itself.
This seems to be a bug of the underlying MoonSharp implementation. As has been pointed out in the comments already, your pattern runs just fine on large input strings using the official PUC Lua 5.3 implementation:
> description = "[-]Weapons" .. ("."):rep(1e6) .. "\n" .. ("."):rep(1234567) .. "\n\n[-]More Stuff" .. ("."):rep(1e7)
> #string.match(description, "Weapons.-\n(.*)\n\n")
1234567
Considering the unreliable pattern implementation of MoonSharp (the code appears to port the Lua implementation, but I think they're forgetting to increment matchdepth
again when the function is returning), I'd implement this matching without patterns by looping over the lines or finding the pattern items using find
(without using patterns, though).
The following function does exactly this for the fixed pattern "Weapons.-\n(.-)\n\n"
. Nnote how the last argument of all find
calls is set to true
in order to prevent pattern matching:
local function extract_weapons(description)
local _, end_weapons = description:find("Weapons", 1, true)
if not end_weapons then return end
local _, end_newline = description:find("\n", end_weapons + 1, true)
if not end_newline then return end
local start_newlines = description:find("\n\n", end_newline + 1, true)
if not start_newlines then return end
return description:sub(end_newline + 1, start_newlines - 1)
end
Upvotes: 1
Reputation: 397
So, for anyone else that may come across this issue, here is the fix and why it happens.
As others have said this is indeed due to a bug in the LUA interpreter of Tabletop Simulator. TTS does not use native LUA, but an interpreter called MoonSharp v2.0. This version has this bug where it seems that when your regex (pattern) matches a long string it errors. Just want to emphasise that last sentence - it is not the string you’re parsing that has this limitation, but the string that is returned from the match, the original string can be any length (as far as I've seen).
The fix was to put in a workaround. I first split the larger string (see regex link above for the string example) into individual lines and create an array with them (a table in LUA speak). I then rebuilt the original string by looping through each item in the array and concatenate them. In the loop I had an if statement which looked for the string “Abilities” and once matched, it would exit the loop. This is how I essentially have a string identical to the original minus the abilities section. Maybe this workaround will help others who come across this.
Snippet of the code here so you can see the gist of it:
--this first line gets the data you see in the regex I listed above
local weaponSection = {string.match(description, "Weapons.-\n(.*)\n")}
-- Because Moonsharp regex is bugged we have to split the entire weapon section string into subcomponents then rebuild it
local temptable = {}
local rebuiltWeaponSection = ""
-- split the larger string into line by line, then insert into array - this bring "abilities" section across which we don't want and can't exclude due to bug explained above
for weapon in string.gmatch(weaponSection[1], ".-\n") do
table.insert(temptable, weapon)
end
-- now loop through the array and concat each line to a new string
for _, weapon in ipairs(temptable) do
-- this if statement looks for the abilities line and then exits loop when he sees it. this ultimately ends up rebuilding it all without the abilities section
if string.match(weapon, "Abilities") then
break
else
rebuiltWeaponSection = rebuiltWeaponSection .. weapon
end
end
Upvotes: 1
Reputation: 627537
You can use
result = s:match("Weapons.-\n(.-)\n\n")
See the online Lua demo. Details:
Weapons
- a word.-
- any zero or more chars, as few as possible\n
- a newline char(.-)
- Group 1: any zero or more chars, as few as possible\n\n
- two newline chars.Upvotes: 0