Evelyn Kokemoor
Evelyn Kokemoor

Reputation: 326

Changing the return order of captures in an LPeg pattern?

(I'm using Lua 5.2 and LPeg 0.12)

Suppose I have a pattern P that produces some indeterminate number of captures, if any, and I want to write create a pattern Q that captures P as well as the position after P--but for that position to be returned before the captures of P. Essentially, if lpeg.match(P * lpeg.Cp(), str, i) results in v1, v2, ..., j, then I want lpeg.match(Q, str, i) to result in j, v1, v2, ....

Is this achievable without having to create a new table every time P is matched?

Mostly I want to do this to simplify some functions that produce iterators. Lua's stateless iterator functions only get one control variable, and it needs to be the first value returned by the iterator function.

In a world that let people name the last arguments of a variadic function, I could write:

function pos_then_captures(pattern)
    local function roll(..., pos)
        return pos, (...)
    end
    return (pattern * lpeg.Cp()) / roll
end

Alas. The easy solution is judicious use of lpeg.Ct():

function pos_then_captures(pattern)
    -- exchange the order of two values and unpack the first parameter
    local function exch(a, b)
        return b, unpack(a)
    end
    return (lpeg.Ct(pattern) * lpeg.Cp()) / exch
end

or to have the caller to lpeg.match do a pack/remove/insert/unpack dance. And as yucky as the latter sounds, I would probably do that one because lpeg.Ct() might have some unintended consequences for pathological but "correct" arguments to pos_then_captures.

Either of these creates a new table every time pattern is successfully matched, which admittedly doesn't matter too much in my application, but is there a way to do this without any pack-unpack magic?

I'm not too familiar with the internals of Lua, but it feels like what I really want to do is pop something from Lua's stack and put it back in somewhere else, which doesn't seem like an operation that would be directly or efficiently supported, but maybe something that LPeg can do in this specific case.

Upvotes: 1

Views: 195

Answers (2)

wqw
wqw

Reputation: 11991

You can do it with your original solution w/o table captures nor match-time captures like this

function pos_then_captures(pattern)
    local function exch(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, ...)
        if a1 == nil then return end
        if a2 == nil then return a1 end
        if a3 == nil then return a2, a1 end
        if a4 == nil then return a3, a1, a2 end
        if a5 == nil then return a4, a1, a2, a3 end
        if a6 == nil then return a5, a1, a2, a3, a4 end
        if a7 == nil then return a6, a1, a2, a3, a4, a5 end
        if a8 == nil then return a7, a1, a2, a3, a4, a5, a6 end
        if a9 == nil then return a8, a1, a2, a3, a4, a5, a6, a7 end
        if a10 == nil then return a9, a1, a2, a3, a4, a5, a6, a7, a8 end
        local t = { a10, ... }
        return t[#t], a1, a2, a3, a4, a5, a6, a7, a8, a9, unpack(t, 1, #t-1)
    end
    return (pattern * lpeg.Cp()) / exch
end

Following sample usage returns each matched 'a' with the end of match in front of it

local p = lpeg.P{ (pos_then_captures(lpeg.C'a') + 1) * lpeg.V(1) + -1 }
print(p:match('abababcd'))

-- output: 2       a       4       a       6       a

Upvotes: 0

Evelyn Kokemoor
Evelyn Kokemoor

Reputation: 326

Match-time captures and upvalues get the job done. This function uses Cmt to ensure pos is set before sticking it in front of pattern's captures in pattern / prepend.

Cmt = lpeg.Cmt
Cp  = lpeg.Cp

function prepend_final_pos(pattern)
    -- Upvalues are dynamic, so this variable belongs to a
    -- new environment for each call to prepend_final_pos.
    local pos

    -- lpeg.Cmt(patt, func) passes the entire text being
    -- searched to `function` as the first parameter, then
    -- any captures. Ignore the first parameter.
    local function setpos(_, x)
      pos = x

      -- If we return nothing, Cmt will fail every time
      return true
    end

    -- Keep the varargs safe!
    local function prepend(...)
      return pos, ...
    end

    -- The `/ 0` in `Cmt(etc etc) / 0` is to get rid of that
    -- captured `true` that we picked up from setpos.
    return (pattern / prepend) * (Cmt(Cp(), setpos) / 0)
end

Sample session:

> bar = lpeg.C "bar"
> Pbar = prepend_final_pos(bar)
> print(lpeg.match(Pbar, "foobarzok", 4))
7       bar
> foo = lpeg.C "foo" / "zokzokzok"
> Pfoobar = prepend_final_pos(foo * bar)
> print(lpeg.match(Pfoobar, "foobarzok"))
7       zokzokzok       bar

As intended, the actual captures have no influence on the position returned by the new pattern; only the length of the text matched by the original pattern.

Upvotes: 1

Related Questions