Reputation: 41493
I've got a file with syntactically correct Lua 5.1 source code.
I've got a position (line and character offset) inside that file.
I need to get an offset in bytes to the closing parenthesis of the innermost function()
body that contains that position (or figure out that the position belongs to the main chunk of the file).
I.e.:
local function foo() ^ result print("bar") ^ input end
local foo = function() ^ result print("bar") ^ input end
local foo = function() return function() ^ result print("bar") ^ input end end
...And so on.
How do I do that robustly?
Upvotes: 2
Views: 351
Reputation: 4268
EDIT: My original answer did not take into account the "innermost" requirement. I've since taken that into account
To make things "robust," there are a few considerations.
First of all, it's important that you skip over string and comment contents, to avoid incorrect output in situations like:
foo = function()
print(" function() ")
-- function()
print("bar")
^ input
end
This can be somewhat difficult, considering Lua's nested string and comment syntax. Consider, for example, a situation where the input begins in a nested string or comment:
foo = function()
print([[
bar = function()
print("baz")
^ input
end
]])
end
Consequently, if you want a completely robust system, it is not acceptable to only parse backwards until you hit the end of a function parameter list, because you may not have parsed backwards far enough to reach a [[
which would invalidate your match. It is therefore necessary to parse the entire file up to your position (unless you're okay with incorrect matches in these weird situations. If this is an editor plugin, these "incorrect" results may actually be desirable, because they would allow you to edit lua code which is stored in string literal form inside other lua code using the same plugin).
Because the particular syntax that you're trying to match doesn't have any kind of "nesting", a full-blown parser isn't needed. You will need to maintain a stack, however, to keep track of scope. With that in mind, all you need to do is step through the source file character-by-character from the beginning, applying the following logic:
"
or '
is encountered, ignore the characters up to the closing "
or '
. Be careful to handle escapes like \"
and \\
--
is encountered, ignore the characters up to the closing newline for the comment. Be careful to only do this if the comment is not a multiline comment.[[
, [=[
, etc), or a multiline comment symbol is encountered (such as --[[
or --[=[
, etc) ignore the characters up until the closing square brackets with the proper number of matching equals signs between them.end
(for example, if
, while
, for
, function
, etc. DO NOT include repeat
). If so, push the position on the scope stack. A "word boundary" in this case is any character which could not be used a lua identifier (this is to prevent matches in cases like abcfunction()
). The beginning of the file is also considered a word boundary.end
, pop the top element of the stack. If the stack has no elements, complain about a syntax error.When you finally step forward and reach your "input" position, pop elements from the stack until you find a function
scope. Step forward from that position to the next )
, ignoring )
's in comments (which could theoretically be found in an argument list if it spans multiple lines or contains inline --[[ ]]
comments). That position is your result.
This should handle every case, including situations where the function
syntactic sugar is used, like
function foo()
print("bar")
end
which you did not include in your example but which I imagine you still want to match.
Upvotes: 0