Reputation: 33
I have a very very large TSV file. The first line is headers. The following lines contain data followed by tabs or double-tabs if a field was blank otherwise the fields can contain alphanumerics or alphanumerics plus punctuation marks.
for example:
Field1<tab>Field2<tab>FieldN<newline>
The fields may contain spaces, punctuation or alphanumerics. The only thing(s) that remains true are:
I've tried many combinations of pattern matching in lua and never get it quite right. Typically the fields with punctuation (time and date fields) are the ones that get me.
I need the blank fields (the ones with double-tab) preserved so that the rest of the fields are always at the same index value.
Thanks in Advance!
Upvotes: 3
Views: 524
Reputation: 521
Rows and columns are separated:
local filename = "big_tables.tsv" -- tab separated values
-- local filename = "big_tables.csv" -- comma separated values
local lines = io.lines(filename) -- open file as lines
local tables = {} -- new table with columns and rows as tables[n_column][n_row]=value
for line in lines do -- row iterator
local i = 1 -- first column
for value in (string.gmatch(line, "[^%s]+")) do -- tab separated values
-- for value in (string.gmatch(line, '%d[%d.]*')) do -- comma separated values
tables[i]=tables[i]or{} -- if not column then create new one
tables[i][#tables[i]+1]=tonumber(value) -- adding row value
i=i+1 -- column iterator
end
end
Upvotes: 0
Reputation: 72312
Try the code below:
function test(s)
local n=0
s=s..'\t'
for w in s:gmatch("(.-)\t") do
n=n+1
print(n,"["..w.."]")
end
end
test("10\t20\t30\t\t50")
test("100\t200\t300\t\t500\t")
It adds a tab to the end of the string so that all fields are follow by a tab, even the last one.
Upvotes: 2