Reputation: 105
I have a program that produces simulated typing. The program takes user input on where the location of the file is and the file along with the extension. It then breaks down the file using an iteration and puts it into an array.
def file_to_array(file)
empty = []
File.foreach("#{file}") do |line|
empty << line.to_s.split('')
end
return empty.flatten!
end
When the program runs, it sends the keys to the text area to simulate typing via win32ole
.
After 5,000 characters there's too much memory overhead and the program begins to slow down. The further past 5,000 characters, the slower it goes. Is there a way that this can be optimized?
--EDIT--
require 'Benchmark'
def file_to_array(file)
empty = []
File.foreach(file) do |line|
empty << line.to_s.split('')
end
return empty.flatten!
end
def file_to_array_2(file)
File.read(file).split('')
end
file = 'xxx'
Benchmark.bm do |results|
results.report { print file_to_array(file) }
results.report { print file_to_array_2(file) }
end
user system total real
0.234000 0.000000 0.234000 ( 0.787020)
0.218000 0.000000 0.218000 ( 1.917185)
Upvotes: 2
Views: 266
Reputation: 8908
I did my benchmark and profile, here is the code:
#!/usr/bin/env ruby
require 'benchmark'
require 'rubygems'
require 'ruby-prof'
def ftoa_1(path)
empty = []
File.foreach(path) do |line|
empty << line.to_s.split('')
end
return empty.flatten!
end
def ftoa_2(path)
File.read(path).split('')
end
def ftoa_3(path)
File.read(path).chars
end
def ftoa_4(path)
File.open(path) { |f| f.each_char.to_a }
end
GC.start
GC.disable
Benchmark.bm(6) do |x|
1.upto(4) do |n|
x.report("ftoa_#{n}") {send("ftoa_#{n}", ARGV[0])}
end
end
1.upto(4) do |n|
puts "\nProfiling ftoa_#{n} ...\n"
result = RubyProf.profile do
send("ftoa_#{n}", ARGV[0])
end
RubyProf::FlatPrinter.new(result).print($stdout)
end
And here is my result:
user system total real
ftoa_1 2.090000 0.160000 2.250000 ( 2.250350)
ftoa_2 1.540000 0.090000 1.630000 ( 1.632173)
ftoa_3 0.420000 0.080000 0.500000 ( 0.505286)
ftoa_4 0.550000 0.090000 0.640000 ( 0.630003)
Profiling ftoa_1 ...
Measure Mode: wall_time
Thread ID: 70190654290440
Fiber ID: 70189795562220
Total: 2.571306
Sort by: self_time
%self total self wait child calls name
83.39 2.144 2.144 0.000 0.000 103930 String#split
12.52 0.322 0.322 0.000 0.000 1 Array#flatten!
3.52 2.249 0.090 0.000 2.159 1 <Class::IO>#foreach
0.57 0.015 0.015 0.000 0.000 103930 String#to_s
0.00 2.571 0.000 0.000 2.571 1 Global#[No method]
0.00 2.571 0.000 0.000 2.571 1 Object#ftoa_1
0.00 0.000 0.000 0.000 0.000 1 Fixnum#to_s
* indicates recursively called methods
Profiling ftoa_2 ...
Measure Mode: wall_time
Thread ID: 70190654290440
Fiber ID: 70189795562220
Total: 1.855242
Sort by: self_time
%self total self wait child calls name
99.77 1.851 1.851 0.000 0.000 1 String#split
0.23 0.004 0.004 0.000 0.000 1 <Class::IO>#read
0.00 1.855 0.000 0.000 1.855 1 Global#[No method]
0.00 1.855 0.000 0.000 1.855 1 Object#ftoa_2
0.00 0.000 0.000 0.000 0.000 1 Fixnum#to_s
* indicates recursively called methods
Profiling ftoa_3 ...
Measure Mode: wall_time
Thread ID: 70190654290440
Fiber ID: 70189795562220
Total: 0.721246
Sort by: self_time
%self total self wait child calls name
99.42 0.717 0.717 0.000 0.000 1 String#chars
0.58 0.004 0.004 0.000 0.000 1 <Class::IO>#read
0.00 0.721 0.000 0.000 0.721 1 Object#ftoa_3
0.00 0.721 0.000 0.000 0.721 1 Global#[No method]
0.00 0.000 0.000 0.000 0.000 1 Fixnum#to_s
* indicates recursively called methods
Profiling ftoa_4 ...
Measure Mode: wall_time
Thread ID: 70190654290440
Fiber ID: 70189795562220
Total: 0.816140
Sort by: self_time
%self total self wait child calls name
99.99 0.816 0.816 0.000 0.000 2 IO#each_char
0.00 0.000 0.000 0.000 0.000 1 File#initialize
0.00 0.000 0.000 0.000 0.000 1 IO#close
0.00 0.816 0.000 0.000 0.816 1 <Class::IO>#open
0.00 0.000 0.000 0.000 0.000 1 IO#closed?
0.00 0.816 0.000 0.000 0.816 1 Global#[No method]
0.00 0.816 0.000 0.000 0.816 1 Enumerable#to_a
0.00 0.816 0.000 0.000 0.816 1 Enumerator#each
0.00 0.816 0.000 0.000 0.816 1 Object#ftoa_4
0.00 0.000 0.000 0.000 0.000 1 Fixnum#to_s
* indicates recursively called methods
The conclusion is that ftoa_3
is the fastest when GC is turned off, but I would recommend ftoa_4
because it uses less memory and thus reduces the times of GC. If you turn GC on, you can see ftoa_4
will be the fastest.
From the profile result, you can see the program spends most time in String#split
in both ftoa_1
and ftoa_2
. The ftoa_1
is the worst because String#split
runs many times (1 for each line), and Array.flatten!
also takes a lot of time.
Upvotes: 2
Reputation: 107142
Yes this can be optimized (written sorter, with less assignments and less method calls):
def file_to_array(file)
File.read(file).split('')
end
This works, because file
is already a string and therefore there is no need for the string interpolation "#{file}"
. File.read
returns the whole file, this removes the need to iterate over each line. Without the iteration there is no need for a temporary empty
array, the flatten!
and the string concatenation <<
. And there is no need for the explizit return
in your example.
Update: It is not clear from your question what you are optimizing for: performance, memory usage or readablity. Since I was surprised by your benchmark results I ran my own. And I think my solution is faster than yours.
But the results might differ on different Ruby versions (I used Ruby 2.3), the input file size and number of lines or the number of iterations ran in the benchmark.
def file_to_array_1(file)
empty = []
File.foreach("#{file}") do |line|
empty << line.to_s.split('')
end
return empty.flatten!
end
def file_to_array_2(file)
File.read(file).split('')
end
require 'benchmark'
# file = '...' # a path to a file with about 26KB data in about 750 lines
n = 1000
Benchmark.bmbm(15) do |x|
x.report("version 1 :") { n.times do; file_to_array_1(file); end }
x.report("version 2 :") { n.times do; file_to_array_2(file); end }
end
# Rehearsal ---------------------------------------------------
# version 1 : 11.970000 0.110000 12.080000 ( 12.092841)
# version 2 : 8.150000 0.120000 8.270000 ( 8.267420)
# ----------------------------------------- total: 20.350000sec
# user system total real
# version 1 : 11.940000 0.100000 12.040000 ( 12.045505)
# version 2 : 8.130000 0.110000 8.240000 ( 8.248707)
# [Finished in 40.7s]
Upvotes: 0