Peter R
Peter R

Reputation: 3516

How to write a lua pattern that is aware of escaped characters?

I want to write a patterns that takes a string like this /a/b/c and extracts a, b, and c. a, b, and c are optional, so /// is a valid input. Currently I have this: "^%/(.-)%/(.-)%/(.-)$". This works, except if my input is /</>/b/c, I get matches: <, >, b/c. Obviously the second / should be escaped like this: /<\\/>/b/c, however this gives me: <\, >, b/c. Is there a way to write this pattern such that /<\\/>/b/c would give me: <\/>, b, c? I know I could change the first .- to a .+ and that would solve this exact issue, but it doesn't solve the larger issue(i.e. what if the escaped slash is in section b).

Upvotes: 3

Views: 65

Answers (2)

ESkri
ESkri

Reputation: 1928

It is impossible to achieve using a single Lua pattern, but you can chain a few of them:

local s = "/<\\/>//b\\\\/c"  -- 4 payloads here  (the second one is empty)
for x in s
      :gsub("/", "/\1")       -- make every payload non-empty by prepending string.char(1)
      :gsub("\\(.)", "\2%1")  -- replace magic backslashes with string.char(2)
      :gsub("%f[/\2]/", "\0") -- replace non-escaped slashes with string.char(0)
      :gsub("[\1\2]", "")     -- remove temporary symbols string.char(1) and string.char(2)
      :gmatch"%z(%Z*)"        -- split by string.char(0)
do
   print(x)
end

Output:

</>

b\
c

Or, if you want a single statement instead of a loop:

local s = "/<\\/>/b/c"  -- 3 payloads here
local a, b, c = s
      :gsub("/", "/\1")       -- make every payload non-empty by prepending string.char(1)
      :gsub("\\(.)", "\2%1")  -- replace magic backslashes with string.char(2)
      :gsub("%f[/\2]/", "\0") -- replace non-escaped slashes with string.char(0)
      :gsub("[\1\2]", "")     -- remove temporary symbols string.char(1) and string.char(2)
      :match"%z(%Z*)%z(%Z*)%z(%Z*)"  -- split by string.char(0)

Upvotes: 3

InSync
InSync

Reputation: 10873

To the best of my knowledge, it's not possible.

Normally, in POSIX ERE a working regex would be:

^/(?:[^\\/]|\\.)*/(?:[^\\/]|\\.)*/(?:[^\\/]|\\.)*$

...where (?:[^\\/]|\\.) means "either not \ (escaping) and not / (delimiter), or an escaped character".

However, Lua patterns don't have |. Quantifiers are also not applicable for groups. That said, there is no way to differentiate normal and escaped characters.

The solution is to write your own parser from scratch.

Upvotes: 2

Related Questions