kalimin
kalimin

Reputation: 29

How to read the number of lines in a LARGE text file effeciently

I have a large text file of ~750,000 lines that gets updated constantly every few seconds, and I want to be able to monitor the number of lines in real time. I am able to do that, but at a very heavy cost of response time.

function GetFileSize( filename )
  local fp = io.open( filename )
  if fp == nil then
    return nil
  end
  file = {}
  for line in fp:lines() do
    if (file[line] ~= line) then
      table.insert(file, line)
    end
  end
  d(table.size(file))
  local filesize = fp:seek( "end" )
  fp:close()
  return filesize
end

I'm trying to get two things, the size (bytes) and the number of lines.

However, filling the table up with 750,000 lines over and over, reading the file from top to bottom, constantly, causes quite a bit of processing time.

Is there a way to get both the file size in bytes, but also get the number of lines, without severely hindering my system.

Pretty much I'm guessing I have to create a permanent table outside of the function, where you read the file and add the lines to the table. However, I'm not sure how to stop it from duplicating itself every few seconds.

Should I just abandon the line count and stick with the byte return since that doesn't slow me down at all? or is there an efficient way to get both.

Thanks!

Upvotes: 1

Views: 1801

Answers (3)

moteus
moteus

Reputation: 2235

I can suggest this solution. Which does not require read all large file.

local function char_count(str, ch)
  local n, p = 0
  while true do
    p = string.find(str, ch, p, true)
    if not p then break end
    n, p = n + 1, p + 1
  end
  return n
end

local function file_info(name, chunk_size)
  chunk_size = chunk_size or 4096
  local f, err, no = io.open(name, 'rb')
  if not f then return nil, err, no end
  local lines, size = 0, 0
  while true do
    local chunk = f:read(chunk_size)
    if not chunk then break end
    lines = lines + char_count(chunk, '\n')
    size = size + #chunk
  end
  f:close()
  return size, lines
end

But if you just need monitor one file and count lines in it may be just use any file monitor solution. I use one based on LibUV

Upvotes: 1

Henri Menke
Henri Menke

Reputation: 10939

To get the file size in bytes use Lua Filesystem. For the number of lines you might want to use the io.lines iterator. For better performance of the latter there is a trick described in »Programming in Lua«.

local file = arg[0] -- just use the source file for demo

-- Get the file size
local lfs = assert(require"lfs")
local attr = lfs.attributes(file)
print(attr.size)

-- Get number of lines
local count = 0
for line in io.lines(file) do
   count = count + 1
end
print(count)

Upvotes: 1

lhf
lhf

Reputation: 72412

Try reading the whole file at once and count the number of lines with gsub. You'll have to test whether this is fast enough for you.

 t = f:read("*a")
 _,n = t:gsub("\n","")

Upvotes: 1

Related Questions