Reputation: 783
I want to read lines of file from SFTP server. There are more than 100000 lines in the file.
I am reading in 2 ways.
Net::SSH.start(setting.host, setting.user,
{
:key_data => [ key ],
:keys => [],
:keys_only => true
}
) do |ssh|
ssh.sftp.connect do |sftp|
sftp.dir.foreach(src_dir) do |entry|
if entry.name.include? today
filename = "#{src_dir}/#{entry.name}"
sftp.file.open(filename, "r") do |f|
# Way 1
f.readlines.each do |line|
parse(line)
end
# Way 2
while line = f.gets do
parse(line)
end
end
end
end
end
end
I want to know which way is better in memory usage.
Upvotes: 0
Views: 827
Reputation: 15248
Generally cycles are faster than blocks because of scope.
And arrays take much memory.
#readlines
reads all of the lines in ios, and returns them in an array.
#gets
reads the next line from the I/O stream
I wrote little benchmark for file with 1387085 lines.
Also added ::readlines
that reads the entire file specified by name as individual lines, and returns those lines in an array and ::foreach
that executes the block for every line in the named I/O port.
require 'benchmark/ips'
require 'benchmark/memory'
@path = File.join(__dir__, 'file.txt')
def open_readlines
File.open(@path, 'r') do |f|
f.readlines.each do |line|
line << 'www'
end
end
end
def open_gets
File.open(@path, 'r') do |f|
while line = f.gets do
line << 'www'
end
end
end
def readlines
File.readlines(@path).each do |line|
line << 'www'
end
end
def foreach
File.foreach(@path) do |line|
line << 'www'
end
end
%i[ips memory].each do |benchmark|
puts benchmark
Benchmark.send(benchmark) do |x|
x.report('::open #readlines') { open_readlines }
x.report('::open #gets') { open_gets }
x.report('::readlines') { readlines }
x.report('::foreach') { foreach }
x.compare!
end
end
And results are:
ips
Warming up --------------------------------------
::open #readlines 1.000 i/100ms
::open #gets 1.000 i/100ms
::readlines 1.000 i/100ms
::foreach 1.000 i/100ms
Calculating -------------------------------------
::open #readlines 0.575 (± 0.0%) i/s - 3.000 in 5.397538s
::open #gets 0.746 (± 0.0%) i/s - 4.000 in 5.381583s
::readlines 0.570 (± 0.0%) i/s - 3.000 in 5.434956s
::foreach 0.826 (± 0.0%) i/s - 5.000 in 6.057936s
Comparison:
::foreach: 0.8 i/s
::open #gets: 0.7 i/s - 1.11x slower
::open #readlines: 0.6 i/s - 1.44x slower
::readlines: 0.6 i/s - 1.45x slower
memory
Calculating -------------------------------------
::open #readlines 822.274M memsize ( 8.424k retained)
2.774M objects ( 1.000 retained)
50.000 strings ( 0.000 retained)
::open #gets 810.638M memsize ( 0.000 retained)
2.774M objects ( 0.000 retained)
50.000 strings ( 0.000 retained)
::readlines 822.274M memsize ( 0.000 retained)
2.774M objects ( 0.000 retained)
50.000 strings ( 0.000 retained)
::foreach 810.638M memsize ( 0.000 retained)
2.774M objects ( 0.000 retained)
50.000 strings ( 0.000 retained)
Comparison:
::foreach: 810638012 allocated
::open #gets: 810638052 allocated - 1.00x more
::readlines: 822274324 allocated - 1.01x more
::open #readlines: 822274364 allocated - 1.01x more
Upvotes: 1
Reputation: 10056
What do the docs say? (Note that File
is a subclass of IO
. The methods #readlines
and #gets
are defined on IO
.)
Reads all of the lines […], and returns them in an array.
Reads the next “line” from the I/O stream.
Thus, I expect the latter to be better in terms of memory usage as it doesn't load the entire file into memory.
Upvotes: 2