A7X
A7X

Reputation: 129

Lua split string using specific pattern

i need to split each row of a input file using the specific pattern " - ". I'm not so far from solution but my code actually splits also single spaces. Each row of the file is formatted as follow:

NAME - ID - USERNAME - GROUP NAME - GROUP ID - TIMESTAMP

name field may have spaces, same as group name and timestamp, for example a row like that

LUCKY STRIKE - 11223344 - @lucky - CIGARETTES SMOKERS - 44332211 - 11:42 may/5th

is valid. So these tokenized values should be stored inside a table. Here is my code:

local function splitstring(inputstr)
  sep = "(%s-%s)"
  local t={} ; i=1
  for str in string.gmatch(inputstr, "([^"..sep.."]+)") do
      t[i] = str
      i = i + 1
  end
  print("=========="..t[1].."===========")
  print("=========="..t[2].."===========")
  print("=========="..t[3].."===========")
  return t
end

when i run it, puts "lucky" in first field, strike in second field, the id inside third field. Is there a way to store "lucky strike" inside first field, parsing ONLY by pattern specified? Hope you guys could help me.

p.s. I already see the lua manual but didn't help me so much...

Upvotes: 3

Views: 610

Answers (2)

lhf
lhf

Reputation: 72422

Here is another take:

s="LUCKY STRIKE - 11223344 - @lucky - CIGARETTES SMOKERS - 44332211 - 11:42 may/5th"    
s=s.." - "
for v in s:gmatch("(.-)%s+%-%s+") do
    print("["..v.."]")
end

The pattern reflects the definition of the field: everything until - surrounded by spaces. Here "everything" is implemented using the non-greedy pattern .-.To make this work uniformly, we add the separator to the end as well. Many pattern matching problems that use separators can benefit from this uniformity.

Upvotes: 4

Chris Kitching
Chris Kitching

Reputation: 2655

There are a few things wrong with what you have.

Firstly, - is a repetition symbol in Lua patterns: http://www.lua.org/manual/5.2/manual.html#6.4.1

You need to use %- to get a literal -.

We're not done: The resulting gmatch call is string.gmatch(inputstr, "[^%s%-%s]+"). Since your separator pattern is inside [], it's a character class. It says "Give me all the things that aren't a space or a -, and be as greedy as you can", which is why it stops at the first space character.

Your best bet is to do something like:

local function splitstring(inputstr)
  sep = "%-"
  local t={} ; i=1
  for str in string.gmatch(inputstr, "[^"..sep.."]+") do
      t[i] = str
      i = i + 1
  end
  print("=========="..t[1].."===========")
  print("=========="..t[2].."===========")
  print("=========="..t[3].."===========")
  return t
end

Which yields:

==========LUCKY STRIKE ===========
========== 11223344 ===========
========== @lucky ===========

... And now independently fix the problem of the spaces around the values.

Upvotes: 3

Related Questions