DrorNohi
DrorNohi

Reputation: 172

Split string with specified delimiter in lua

I'm trying to create a split() function in lua with delimiter by choice, when the default is space. the default is working fine. The problem starts when I give a delimiter to the function. For some reason it doesn't return the last sub string. The function:

function split(str,sep)
if sep == nil then
    words = {}
    for word in str:gmatch("%w+") do table.insert(words, word) end
    return words
end
return {str:match((str:gsub("[^"..sep.."]*"..sep, "([^"..sep.."]*)"..sep)))} -- BUG!! doesnt return last value
end

I try to run this:

local str = "a,b,c,d,e,f,g"
local sep = ","
t = split(str,sep)
for i,j in ipairs(t) do
    print(i,j)
end

and I get:

1   a
2   b
3   c
4   d
5   e
6   f

Can't figure out where the bug is...

Upvotes: 9

Views: 31883

Answers (4)

Alex
Alex

Reputation: 1580

I add a option for those who do not want to use Regex

local function split(str, sep)
  assert(type(str) == 'string' and type(sep) == 'string', 'The arguments must be <string>')
  if sep == '' then return {str} end
  
  local res, from = {}, 1
  repeat
    local pos = str:find(sep, from)
    res[#res + 1] = str:sub(from, pos and pos - 1)
    from = pos and pos + #sep
  until not from
  return res
end

Upvotes: 1

nicolas.leblanc
nicolas.leblanc

Reputation: 630

Here's my go-to split() function:

-- split("a,b,c", ",") => {"a", "b", "c"}
function split(s, sep)
    local fields = {}
    
    local sep = sep or " "
    local pattern = string.format("([^%s]+)", sep)
    string.gsub(s, pattern, function(c) fields[#fields + 1] = c end)
    
    return fields
end

Upvotes: 8

DarkWiiPlayer
DarkWiiPlayer

Reputation: 7046

"[^"..sep.."]*"..sep This is what causes the problem. You are matching a string of characters which are not the separator followed by the separator. However, the last substring you want to match (g) is not followed by the separator character.

The quickest way to fix this is to also consider \0 a separator ("[^"..sep.."\0]*"..sep), as it represents the beginning and/or the end of the string. This way, g, which is not followed by a separator but by the end of the string would still be considered a match.

I'd say your approach is overly complicated in general; first of all you can just match individual substrings that do not contain the separator; secondly you can do this in a for-loop using the gmatch function

local result = {}
for field in your_string:gsub(("[^%s]+"):format(your_separator)) do
  table.insert(result, field)
end
return result

EDIT: The above code made a bit more simple:

local pattern = "[^%" .. your_separator .. "]+"
for field in string.gsub(your_string, pattern) do
-- ...and so on (The rest should be easy enough to understand)

EDIT2: Keep in mind that you should also escape your separators. A separator like % could cause problems if you don't escape it as %%

function escape(str)
  return str:gsub("([%^%$%(%)%%%.%[%]%*%+%-%?])", "%%%1")
end

Upvotes: 1

lhf
lhf

Reputation: 72312

When splitting strings, the easiest way to avoid corner cases is to append the delimiter to the string, when you know the string cannot end with the delimiter:

str = "a,b,c,d,e,f,g"
str = str .. ','
for w in str:gmatch("(.-),") do print(w) end

Alternatively, you can use a pattern with an optional delimiter:

str = "a,b,c,d,e,f,g"
for w in str:gmatch("([^,]+),?") do print(w) end

Actually, we don't need the optional delimiter since we're capturing non-delimiters:

str = "a,b,c,d,e,f,g"
for w in str:gmatch("([^,]+)") do print(w) end

Upvotes: 14

Related Questions