RCIX
RCIX

Reputation: 39467

Split string in Lua

How do I split a string? There doesn't seem to be a built-in function for this.

Upvotes: 233

Views: 469865

Answers (23)

mcepl
mcepl

Reputation: 2786

I would say that the best answer I know about right now is to consider Penlight as a Lua’s standard library and use https://lunarmodules.github.io/Penlight/libraries/pl.stringx.html#split

Upvotes: 0

user973713
user973713

Reputation:

Use the gmatch() function to capture strings which contain at least one character of anything other than the desired separator. The separator is any whitespace (%s in Lua) by default:

function mysplit(inputstr, sep)
  if sep == nil then
    sep = "%s"
  end
  local t = {}
  for str in string.gmatch(inputstr, "([^"..sep.."]+)") do
    table.insert(t, str)
  end
  return t
end

Upvotes: 185

user23360953
user23360953

Reputation:

this function extends string with split function 'python like'.

-- string functions utilities
string.split=function(self,sep,limit)
    if sep==nil then
        sep=" "
    end
    local _table = {}
    local _string = ''
    local x = 0 -- separation counter
    for i=1,#self do
        local char=string.sub(self,i,i) -- get character 'i' in string
        if limit==nil then
            -- unlimited separations
            if char == sep then
                -- separation found, insert string in table and 'reset' string to store next world
                table.insert(_table,_string)
                _string=""
            else
                _string = _string .. char -- store no separation character to store later
                if i==#self then
                    -- last character in string, just add its remain to table
                    table.insert(_table,_string)
                end
            end 
        elseif type(limit)=="number" then
            -- limited separations
            if char == sep then
                -- separation character found
                x=x+1 -- increment separator count
                if x<=limit then
                    -- while separator count <= limit, add _string to _table, and reset _string to next world
                    table.insert(_table,_string)
                    _string="" 
                else
                    -- separator counter limit pass, now just concat chars to the last string
                    _string = _string .. sep
                end
            else
                -- no char seprator, concat char to string and insert in table if last char
                _string = _string .. char
                if i==#self then
                    table.insert(_table,_string)
                end
            end
        end

    end
    -- return splitted table
    return _table
end

usage:

msg='my favorite string'
msg=msg:split(' ',1)

this will result in table {'my','favorite string'} as expected!

Upvotes: 0

Hohenheim
Hohenheim

Reputation: 585

a way not seen in others

function str_split(str, sep)
    if sep == nil then
        sep = '%s'
    end 

    local res = {}
    local func = function(w)
        table.insert(res, w)
    end 

    string.gsub(str, '[^'..sep..']+', func)
    return res 
end

Upvotes: 7

darkfrei
darkfrei

Reputation: 586

The way to split a string to two strings in given position:

str1 = "helloworld"
str2 = ""
index = 5
str1, str2 = string.sub(str1, 1, index), string.sub(str1, index+1, -1)
print (str1, str2) -- hello world

Upvotes: 0

carschandler
carschandler

Reputation: 145

Cleanest/simplest solution yet? For splitting on whitespace, that is.

function(argstr)
  local args = {}
  for v in string.gmatch(argstr, "%S+") do
    table.insert(args, v)
  end
  return args
end

Upvotes: 0

thisisrandy
thisisrandy

Reputation: 3075

There's an example (unexpandTabs) at the end of the Replacements section of Programming in Lua, 4th Ed., Chapter 10, that uses the SOH character (\1) to mark tab columns for later processing. I thought that was a neat idea, so I adapted it to the "match everything except a delimiter character" ideas that many of the answers here use. By preprocessing the input string to replace all matches with \1, we can support arbitrary delimiter patterns, which is something only some answers do, e.g. @norman-ramsey's excellent answer.

I also included an exclude_empty parameter with default behavior just for fun.

Obviously this will produce bad output if the input string contains \1, but that seems extremely unlikely in any case outside of specialized protocol exchanges.

function string:split(pat, exclude_empty)
  pat = pat or "%s+"
  self = self:gsub(pat, "\1")
  local res = {}
  for match in self:gmatch("([^\1]" .. (exclude_empty and "+" or "*") .. ")") do
    res[#res + 1] = match
  end
  return res
end

Upvotes: 0

S&#233;bastien
S&#233;bastien

Reputation: 88

For those coming from the exercice 10.1 of the "Programming in Lua" book, it seems clear that we could not use notion explained later in the book (iterator) and that the function should take more than a single char seperator.

The split() is a trick to get pattern to match what is not wanted (the split) and return an empty table on empty string. The return of plainSplit() is more like the split in other language.

magic = "([%%%.%(%)%+%*%?%[%]%^%$])"

function split(str, sep, plain)
    if plain then sep = string.gsub(sep, magic, "%%%1") end
    
    local N = '\255'
    str = N..str..N
    str = string.gsub(str, sep, N..N)

    local result = {}
    for word in string.gmatch(str, N.."(.-)"..N) do
        if word ~= "" then
            table.insert(result, word)
        end
    end
    return result
end


function plainSplit(str, sep)
    sep = string.gsub(sep, magic, "%%%1")

    local result = {}
    local start = 0
    repeat
        start = start + 1

        local from, to = string.find(str, sep, start)
        from = from and from-1
        
        local word = string.sub(str, start, from, true)
        table.insert(result, word)

        start = to
    until start == nil

    return result
end


function tableToString(t)
    local ret = "{"
    for _, word in ipairs(t) do
        ret = ret .. '"' .. word .. '", '
    end
    ret = string.sub(ret, 1, -3)
    ret = ret .. "}"

    return #ret > 1 and ret or "{}"
end

function runSplit(func, title, str, sep, plain)
    print("\n" .. title)
    print("str: '"..str.."'")
    print("sep: '"..sep.."'")
    local t = func(str, sep, plain)
    print("-- t = " .. tableToString(t))
end



print("\n\n\n=== Pattern split ===")
runSplit(split, "Exercice 10.1", "a whole new world", " ")
runSplit(split, "With trailing seperator", "  a  whole   new world  ", " ")
runSplit(split, "A word seperator", "a whole new world", " whole ")
runSplit(split, "Pattern seperator", "a1whole2new3world", "%d")
runSplit(split, "Magic characters as plain seperator", "a$.%whole$.%new$.%world", "$.%", true)
runSplit(split, "Control seperator", "a\0whole\1new\2world", "%c")
runSplit(split, "ISO Time", "2020-07-10T15:00:00.000", "[T:%-%.]")

runSplit(split, " === [Fails] with \\255 ===", "a\255whole\0new\0world", "\0", true)

runSplit(split, "How does your function handle empty string?", "", " ")



print("\n\n\n=== Plain split ===")
runSplit(plainSplit, "Exercice 10.1", "a whole new world", " ")
runSplit(plainSplit, "With trailing seperator", "  a  whole   new world  ", " ")
runSplit(plainSplit, "A word seperator", "a whole new world", " whole ")
runSplit(plainSplit, "Magic characters as plain seperator", "a$.%whole$.%new$.%world", "$.%")

runSplit(plainSplit, "How does your function handle empty string?", "", " ")

output

=== Pattern split ===

Exercice 10.1
str: 'a whole new world'
sep: ' '
-- t = {"a", "whole", "new", "world"}

With trailing seperator
str: '  a  whole   new world  '
sep: ' '
-- t = {"a", "whole", "new", "world"}

A word seperator
str: 'a whole new world'
sep: ' whole '
-- t = {"a", "new world"}

Pattern seperator
str: 'a1whole2new3world'
sep: '%d'
-- t = {"a", "whole", "new", "world"}

Magic characters as plain seperator
str: 'a$.%whole$.%new$.%world'
sep: '$.%'
-- t = {"a", "whole", "new", "world"}

Control seperator
str: 'awholenewworld'
sep: '%c'
-- t = {"a", "whole", "new", "world"}

ISO Time
str: '2020-07-10T15:00:00.000'
sep: '[T:%-%.]'
-- t = {"2020", "07", "10", "15", "00", "00", "000"}

 === [Fails] with \255 ===
str: 'a�wholenewworld'
sep: ''
-- t = {"a"}

How does your function handle empty string?
str: ''
sep: ' '
-- t = {}



=== Plain split ===

Exercice 10.1
str: 'a whole new world'
sep: ' '
-- t = {"a", "whole", "new", "world"}

With trailing seperator
str: '  a  whole   new world  '
sep: ' '
-- t = {"", "", "a", "", "whole", "", "", "new", "world", "", ""}

A word seperator
str: 'a whole new world'
sep: ' whole '
-- t = {"a", "new world"}

Magic characters as plain seperator
str: 'a$.%whole$.%new$.%world'
sep: '$.%'
-- t = {"a", "whole", "new", "world"}

How does your function handle empty string?
str: ''
sep: ' '
-- t = {""}

Upvotes: 1

Steve Vermeulen
Steve Vermeulen

Reputation: 1455

I found that many of the other answers had edge cases which failed (eg. when given string contains #, { or } characters, or when given a delimiter character like % which require escaping). Here is the implementation that I went with instead:

local function newsplit(delimiter, str)
    assert(type(delimiter) == "string")
    assert(#delimiter > 0, "Must provide non empty delimiter")

    -- Add escape characters if delimiter requires it
    delimiter = delimiter:gsub("[%(%)%.%%%+%-%*%?%[%]%^%$]", "%%%0")

    local start_index = 1
    local result = {}

    while true do
       local delimiter_index, _ = str:find(delimiter, start_index)

       if delimiter_index == nil then
          table.insert(result, str:sub(start_index))
          break
       end

       table.insert(result, str:sub(start_index, delimiter_index - 1))

       start_index = delimiter_index + 1
    end

    return result
end

Upvotes: 0

Graham Gunderson
Graham Gunderson

Reputation: 49

Here is a routine that works in Lua 4.0, returning a table t of the substrings in inputstr delimited by sep:

function string_split(inputstr, sep)
    local inputstr = inputstr .. sep
    local idx, inc, t = 0, 1, {}
    local idx_prev, substr
    repeat 
        idx_prev = idx
        inputstr = strsub(inputstr, idx + 1, -1)    -- chop off the beginning of the string containing the match last found by strfind (or initially, nothing); keep the rest (or initially, all)
        idx = strfind(inputstr, sep)                -- find the 0-based r_index of the first occurrence of separator 
        if idx == nil then break end                -- quit if nothing's found
        substr = strsub(inputstr, 0, idx)           -- extract the substring occurring before the separator (i.e., data field before the next delimiter)
        substr = gsub(substr, "[%c" .. sep .. " ]", "") -- eliminate control characters, separator and spaces
        t[inc] = substr             -- store the substring (i.e., data field)
        inc = inc + 1               -- iterate to next
    until idx == nil
    return t
end

This simple test

inputstr = "the brown lazy fox jumped over the fat grey hen ... or something."
sep = " " 
t = {}
t = string_split(inputstr,sep)
for i=1,15 do
    print(i, t[i])
end

Yields:

--> t[1]=the
--> t[2]=brown
--> t[3]=lazy
--> t[4]=fox
--> t[5]=jumped
--> t[6]=over
--> t[7]=the
--> t[8]=fat
--> t[9]=grey
--> t[10]=hen
--> t[11]=...
--> t[12]=or
--> t[13]=something.

Upvotes: -1

Benjamin Vison
Benjamin Vison

Reputation: 504

Super late to this question, but in case anyone wants a version that handles the amount of splits you want to get.....

-- Split a string into a table using a delimiter and a limit
string.split = function(str, pat, limit)
  local t = {}
  local fpat = "(.-)" .. pat
  local last_end = 1
  local s, e, cap = str:find(fpat, 1)
  while s do
    if s ~= 1 or cap ~= "" then
      table.insert(t, cap)
    end

    last_end = e+1
    s, e, cap = str:find(fpat, last_end)

    if limit ~= nil and limit <= #t then
      break
    end
  end

  if last_end <= #str then
    cap = str:sub(last_end)
    table.insert(t, cap)
  end

  return t
end

Upvotes: 3

greenage
greenage

Reputation: 397

Depending on the use case, this could be useful. It cuts all text either side of the flags:

b = "This is a string used for testing"

--Removes unwanted text
c = (b:match("a([^/]+)used"))

print (c)

Output:

string

Upvotes: -3

user11464249
user11464249

Reputation:

You could use penlight library. This has a function for splitting string using delimiter which outputs list.

It has implemented many of the function that we may need while programming and missing in Lua.

Here is the sample for using it.

> 
> stringx = require "pl.stringx"
> 
> str = "welcome to the world of lua"
> 
> arr = stringx.split(str, " ")
> 
> arr
{welcome,to,the,world,of,lua}
> 

Upvotes: 6

Jack Taylor
Jack Taylor

Reputation: 6237

A lot of these answers only accept single-character separators, or don't deal with edge cases well (e.g. empty separators), so I thought I would provide a more definitive solution.

Here are two functions, gsplit and split, adapted from the code in the Scribunto MediaWiki extension, which is used on wikis like Wikipedia. The code is licenced under the GPL v2. I have changed the variable names and added comments to make the code a bit easier to understand, and I have also changed the code to use regular Lua string patterns instead of Scribunto's patterns for Unicode strings. The original code has test cases here.

-- gsplit: iterate over substrings in a string separated by a pattern
-- 
-- Parameters:
-- text (string)    - the string to iterate over
-- pattern (string) - the separator pattern
-- plain (boolean)  - if true (or truthy), pattern is interpreted as a plain
--                    string, not a Lua pattern
-- 
-- Returns: iterator
--
-- Usage:
-- for substr in gsplit(text, pattern, plain) do
--   doSomething(substr)
-- end
local function gsplit(text, pattern, plain)
  local splitStart, length = 1, #text
  return function ()
    if splitStart then
      local sepStart, sepEnd = string.find(text, pattern, splitStart, plain)
      local ret
      if not sepStart then
        ret = string.sub(text, splitStart)
        splitStart = nil
      elseif sepEnd < sepStart then
        -- Empty separator!
        ret = string.sub(text, splitStart, sepStart)
        if sepStart < length then
          splitStart = sepStart + 1
        else
          splitStart = nil
        end
      else
        ret = sepStart > splitStart and string.sub(text, splitStart, sepStart - 1) or ''
        splitStart = sepEnd + 1
      end
      return ret
    end
  end
end

-- split: split a string into substrings separated by a pattern.
-- 
-- Parameters:
-- text (string)    - the string to iterate over
-- pattern (string) - the separator pattern
-- plain (boolean)  - if true (or truthy), pattern is interpreted as a plain
--                    string, not a Lua pattern
-- 
-- Returns: table (a sequence table containing the substrings)
local function split(text, pattern, plain)
  local ret = {}
  for match in gsplit(text, pattern, plain) do
    table.insert(ret, match)
  end
  return ret
end

Some examples of the split function in use:

local function printSequence(t)
  print(unpack(t))
end

printSequence(split('foo, bar,baz', ',%s*'))       -- foo     bar     baz
printSequence(split('foo, bar,baz', ',%s*', true)) -- foo, bar,baz
printSequence(split('foo', ''))                    -- f       o       o

Upvotes: 11

Faisal Hanif
Faisal Hanif

Reputation: 151

Here is the function:

function split(pString, pPattern)
   local Table = {}  -- NOTE: use {n = 0} in Lua-5.0
   local fpat = "(.-)" .. pPattern
   local last_end = 1
   local s, e, cap = pString:find(fpat, 1)
   while s do
      if s ~= 1 or cap ~= "" then
     table.insert(Table,cap)
      end
      last_end = e+1
      s, e, cap = pString:find(fpat, last_end)
   end
   if last_end <= #pString then
      cap = pString:sub(last_end)
      table.insert(Table, cap)
   end
   return Table
end

Call it like:

list=split(string_to_split,pattern_to_match)

e.g.:

list=split("1:2:3:4","\:")


For more go here:
http://lua-users.org/wiki/SplitJoin

Upvotes: 15

Jerome Anthony
Jerome Anthony

Reputation: 8041

Simply sitting on a delimiter

local str = 'one,two'
local regxEverythingExceptComma = '([^,]+)'
for x in string.gmatch(str, regxEverythingExceptComma) do
    print(x)
end

Upvotes: 5

Diego Pino
Diego Pino

Reputation: 11596

Because there are more than one way to skin a cat, here's my approach:

Code:

#!/usr/bin/env lua

local content = [=[
Lorem ipsum dolor sit amet, consectetur adipisicing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna 
aliqua. Ut enim ad minim veniam, quis nostrud exercitation 
ullamco laboris nisi ut aliquip ex ea commodo consequat.
]=]

local function split(str, sep)
   local result = {}
   local regex = ("([^%s]+)"):format(sep)
   for each in str:gmatch(regex) do
      table.insert(result, each)
   end
   return result
end

local lines = split(content, "\n")
for _,line in ipairs(lines) do
   print(line)
end

Output: Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Explanation:

The gmatch function works as an iterator, it fetches all the strings that match regex. The regex takes all characters until it finds a separator.

Upvotes: 10

intrepidhero
intrepidhero

Reputation: 701

I used the above examples to craft my own function. But the missing piece for me was automatically escaping magic characters.

Here is my contribution:

function split(text, delim)
    -- returns an array of fields based on text and delimiter (one character only)
    local result = {}
    local magic = "().%+-*?[]^$"

    if delim == nil then
        delim = "%s"
    elseif string.find(delim, magic, 1, true) then
        -- escape magic
        delim = "%"..delim
    end

    local pattern = "[^"..delim.."]+"
    for w in string.gmatch(text, pattern) do
        table.insert(result, w)
    end
    return result
end

Upvotes: 3

Ivo
Ivo

Reputation: 23357

I like this short solution

function split(s, delimiter)
    result = {};
    for match in (s..delimiter):gmatch("(.-)"..delimiter) do
        table.insert(result, match);
    end
    return result;
end

Upvotes: 7

krsk9999
krsk9999

Reputation: 61

You can use this method:

function string:split(delimiter)
  local result = { }
  local from  = 1
  local delim_from, delim_to = string.find( self, delimiter, from  )
  while delim_from do
    table.insert( result, string.sub( self, from , delim_from-1 ) )
    from  = delim_to + 1
    delim_from, delim_to = string.find( self, delimiter, from  )
  end
  table.insert( result, string.sub( self, from  ) )
  return result
end

delimiter = string.split(stringtodelimite,pattern) 

Upvotes: 6

Hugo
Hugo

Reputation: 1125

If you just want to iterate over the tokens, this is pretty neat:

line = "one, two and 3!"

for token in string.gmatch(line, "[^%s]+") do
   print(token)
end

Output:

one,

two

and

3!

Short explanation: the "[^%s]+" pattern matches to every non-empty string in between space characters.

Upvotes: 40

Norman Ramsey
Norman Ramsey

Reputation: 202715

Just as string.gmatch will find patterns in a string, this function will find the things between patterns:

function string:split(pat)
  pat = pat or '%s+'
  local st, g = 1, self:gmatch("()("..pat..")")
  local function getter(segs, seps, sep, cap1, ...)
    st = sep and seps + #sep
    return self:sub(segs, (seps or 0) - 1), cap1 or sep, ...
  end
  return function() if st then return getter(st, g()) end end
end

By default it returns whatever is separated by whitespace.

Upvotes: 20

gwell
gwell

Reputation: 2753

If you are splitting a string in Lua, you should try the string.gmatch() or string.sub() methods. Use the string.sub() method if you know the index you wish to split the string at, or use the string.gmatch() if you will parse the string to find the location to split the string at.

Example using string.gmatch() from Lua 5.1 Reference Manual:

 t = {}
 s = "from=world, to=Lua"
 for k, v in string.gmatch(s, "(%w+)=(%w+)") do
   t[k] = v
 end

Upvotes: 45

Related Questions