Reputation: 1559
My environment:
os
and several other packages that would allow access to the native filesystem, shell commands or anything like that, so all functionality must be implemented in Lua itself (only).I need to write a function that returns true
if the input string matches an arbitrary sequence of letters and numbers as a whole word that repeats one or more times, and may have punctuation at the beginning or end of the entire matching substring. I use "whole word" in the same sense as the PCRE word boundary \b
.
To demonstrate the idea, here's an incorrect attempt using the re
module of LuLpeg; it seems to work with negative lookaheads but not negative lookbehinds:
function containsRepeatingWholeWord(input, word)
return re.match(input:gsub('[%a%p]+', ' %1 '), '%s*[^%s]^0{"' .. word .. '"}+[^%s]^0%s*') ~= nil
end
Here are example strings and the expected return value (the quotes are syntactical as if typed into the Lua interpreter, not literal parts of the string; this is done to make trailing/leading spaces obvious):
" one !tvtvtv! two"
, word: tv
, return value: true
"I'd"
, word: d
, return value: false
"tv"
, word: tv
, return value: true
" tvtv! "
, word: tv
, return value: true
" epon "
, word: nope
, return value: false
" eponnope "
, word: nope
, return value: false
"atv"
, word: tv
, return value: false
If I had a full PCRE regex library I could do this quickly, but I don't because I can't link to C, and I haven't found any pure Lua implementations of PCRE or similar.
I'm not certain if LPEG is flexible enough (using LPEG directly or through its re
module) to do what I want, but I'm pretty sure the built-in Lua functions can't do what I want, because it can't handle repeating sequences of characters. (tv)+
does not work with Lua's builtin string:match
function and similar.
Interesting resources I've been scouring to try to figure out how to do this, to no avail:
Upvotes: 3
Views: 458
Reputation: 1807
I think the pattern doesn't work reliably because the %s*[^%s]^0
part matches an optional series of spacing characters followed by non-spacing characters, and then it tries to match the reduplicated word and fails. After that, it doesn't go backwards or forwards in the string and try to match the reduplicated word at another position. The semantics of LPeg and re
are very different from those of most regular expression engines, even for things that look similar.
Here's a re
-based version. The pattern has a single capture (the reduplicated word), so if the reduplicated word was found, matching returns a string rather than a number.
function f(str, word)
local patt = re.compile([[
match_global <- repeated / ( [%s%p] repeated / . )+
repeated <- { %word+ } (&[%s%p] / !.) ]],
{ word = word })
return type(patt:match(str)) == 'string'
end
It is somewhat complex because the vanilla re
does not have a way to generate a lpeg.B
pattern.
Here's a lpeg
version using lpeg.B
. LuLPeg also works here.
local lpeg = require 'lpeg'
lpeg.locale(lpeg)
local function is_at_beginning(_, pos)
return pos == 1
end
function find_reduplicated_word(str, word)
local type, _ENV = type, math
local B, C, Cmt, P, V = lpeg.B, lpeg.C, lpeg.Cmt, lpeg.P, lpeg.V
local non_word = lpeg.space + lpeg.punct
local patt = P {
(V 'repeated' + 1)^1,
repeated = (B(non_word) + Cmt(true, is_at_beginning))
* C(P(word)^1)
* #(non_word + P(-1))
}
return type(patt:match(str)) == 'string'
end
for _, test in ipairs {
{ 'tvtv', true },
{ ' tvtv', true },
{ ' !tv', true },
{ 'atv', false },
{ 'tva', false },
{ 'gun tv', true },
{ '!tv', true },
} do
local str, expected = table.unpack(test)
local result = find_reduplicated_word(str, 'tv')
if result ~= expected then
print(result)
print(('"%s" should%s match but did%s')
:format(str, expected and "" or "n't", expected and "n't" or ""))
end
end
Upvotes: 3
Reputation: 974
Lua patterns are powerful enough.
No LPEG is needed here.
This is your function
function f(input, word)
return (" "..input:gsub(word:gsub("%%", "%%%%"), "\0").." "):find"%s%p*%z+%p*%s" ~= nil
end
This is a test of the function
for _, t in ipairs{
{input = " one !tvtvtv! two", word = "tv", return_value = true},
{input = "I'd", word = "d", return_value = false},
{input = "tv", word = "tv", return_value = true},
{input = " tvtv! ", word = "tv", return_value = true},
{input = " epon ", word = "nope", return_value = false},
{input = " eponnope ", word = "nope", return_value = false},
{input = "atv", word = "tv", return_value = false},
} do
assert(f(t.input, t.word) == t.return_value)
end
Upvotes: 2