Reputation: 172
I'm trying to create a split() function in lua with delimiter by choice, when the default is space. the default is working fine. The problem starts when I give a delimiter to the function. For some reason it doesn't return the last sub string. The function:
function split(str,sep)
if sep == nil then
words = {}
for word in str:gmatch("%w+") do table.insert(words, word) end
return words
end
return {str:match((str:gsub("[^"..sep.."]*"..sep, "([^"..sep.."]*)"..sep)))} -- BUG!! doesnt return last value
end
I try to run this:
local str = "a,b,c,d,e,f,g"
local sep = ","
t = split(str,sep)
for i,j in ipairs(t) do
print(i,j)
end
and I get:
1 a
2 b
3 c
4 d
5 e
6 f
Can't figure out where the bug is...
Upvotes: 9
Views: 31883
Reputation: 1580
I add a option for those who do not want to use Regex
local function split(str, sep)
assert(type(str) == 'string' and type(sep) == 'string', 'The arguments must be <string>')
if sep == '' then return {str} end
local res, from = {}, 1
repeat
local pos = str:find(sep, from)
res[#res + 1] = str:sub(from, pos and pos - 1)
from = pos and pos + #sep
until not from
return res
end
Upvotes: 1
Reputation: 630
Here's my go-to split() function:
-- split("a,b,c", ",") => {"a", "b", "c"}
function split(s, sep)
local fields = {}
local sep = sep or " "
local pattern = string.format("([^%s]+)", sep)
string.gsub(s, pattern, function(c) fields[#fields + 1] = c end)
return fields
end
Upvotes: 8
Reputation: 7046
"[^"..sep.."]*"..sep
This is what causes the problem. You are matching a string of characters which are not the separator followed by the separator. However, the last substring you want to match (g
) is not followed by the separator character.
The quickest way to fix this is to also consider \0
a separator ("[^"..sep.."\0]*"..sep
), as it represents the beginning and/or the end of the string. This way, g
, which is not followed by a separator but by the end of the string would still be considered a match.
I'd say your approach is overly complicated in general; first of all you can just match individual substrings that do not contain the separator; secondly you can do this in a for
-loop using the gmatch
function
local result = {}
for field in your_string:gsub(("[^%s]+"):format(your_separator)) do
table.insert(result, field)
end
return result
EDIT: The above code made a bit more simple:
local pattern = "[^%" .. your_separator .. "]+"
for field in string.gsub(your_string, pattern) do
-- ...and so on (The rest should be easy enough to understand)
EDIT2: Keep in mind that you should also escape your separators. A separator like %
could cause problems if you don't escape it as %%
function escape(str)
return str:gsub("([%^%$%(%)%%%.%[%]%*%+%-%?])", "%%%1")
end
Upvotes: 1
Reputation: 72312
When splitting strings, the easiest way to avoid corner cases is to append the delimiter to the string, when you know the string cannot end with the delimiter:
str = "a,b,c,d,e,f,g"
str = str .. ','
for w in str:gmatch("(.-),") do print(w) end
Alternatively, you can use a pattern with an optional delimiter:
str = "a,b,c,d,e,f,g"
for w in str:gmatch("([^,]+),?") do print(w) end
Actually, we don't need the optional delimiter since we're capturing non-delimiters:
str = "a,b,c,d,e,f,g"
for w in str:gmatch("([^,]+)") do print(w) end
Upvotes: 14