Reputation: 129
i need to split each row of a input file using the specific pattern " - ". I'm not so far from solution but my code actually splits also single spaces. Each row of the file is formatted as follow:
NAME - ID - USERNAME - GROUP NAME - GROUP ID - TIMESTAMP
name field may have spaces, same as group name and timestamp, for example a row like that
LUCKY STRIKE - 11223344 - @lucky - CIGARETTES SMOKERS - 44332211 - 11:42 may/5th
is valid. So these tokenized values should be stored inside a table. Here is my code:
local function splitstring(inputstr)
sep = "(%s-%s)"
local t={} ; i=1
for str in string.gmatch(inputstr, "([^"..sep.."]+)") do
t[i] = str
i = i + 1
end
print("=========="..t[1].."===========")
print("=========="..t[2].."===========")
print("=========="..t[3].."===========")
return t
end
when i run it, puts "lucky" in first field, strike in second field, the id inside third field. Is there a way to store "lucky strike" inside first field, parsing ONLY by pattern specified? Hope you guys could help me.
p.s. I already see the lua manual but didn't help me so much...
Upvotes: 3
Views: 610
Reputation: 72422
Here is another take:
s="LUCKY STRIKE - 11223344 - @lucky - CIGARETTES SMOKERS - 44332211 - 11:42 may/5th"
s=s.." - "
for v in s:gmatch("(.-)%s+%-%s+") do
print("["..v.."]")
end
The pattern reflects the definition of the field: everything until -
surrounded by spaces. Here "everything" is implemented using the non-greedy pattern .-
.To make this work uniformly, we add the separator to the end as well. Many pattern matching problems that use separators can benefit from this uniformity.
Upvotes: 4
Reputation: 2655
There are a few things wrong with what you have.
Firstly, -
is a repetition symbol in Lua patterns:
http://www.lua.org/manual/5.2/manual.html#6.4.1
You need to use %-
to get a literal -
.
We're not done: The resulting gmatch call is string.gmatch(inputstr, "[^%s%-%s]+")
. Since your separator pattern is inside [], it's a character class. It says "Give me all the things that aren't a space or a -, and be as greedy as you can", which is why it stops at the first space character.
Your best bet is to do something like:
local function splitstring(inputstr)
sep = "%-"
local t={} ; i=1
for str in string.gmatch(inputstr, "[^"..sep.."]+") do
t[i] = str
i = i + 1
end
print("=========="..t[1].."===========")
print("=========="..t[2].."===========")
print("=========="..t[3].."===========")
return t
end
Which yields:
==========LUCKY STRIKE ===========
========== 11223344 ===========
========== @lucky ===========
... And now independently fix the problem of the spaces around the values.
Upvotes: 3