Reputation: 105

Optimizing Ruby Arrays or Hashes

I have a program that produces simulated typing. The program takes user input on where the location of the file is and the file along with the extension. It then breaks down the file using an iteration and puts it into an array.

def file_to_array(file)
  empty = []
  File.foreach("#{file}") do |line|
    empty << line.to_s.split('')
  end
  return empty.flatten!
end

When the program runs, it sends the keys to the text area to simulate typing via win32ole.

After 5,000 characters there's too much memory overhead and the program begins to slow down. The further past 5,000 characters, the slower it goes. Is there a way that this can be optimized?

--EDIT--

require 'Benchmark'

def file_to_array(file)
  empty = []
  File.foreach(file) do |line|
    empty << line.to_s.split('')
  end
  return empty.flatten!
end
def file_to_array_2(file)
  File.read(file).split('')
end

file = 'xxx'

Benchmark.bm do |results|
    results.report { print file_to_array(file) }
    results.report { print file_to_array_2(file) }
end
    user     system      total        real
 0.234000   0.000000   0.234000 (  0.787020)
 0.218000   0.000000   0.218000 (  1.917185)

Upvotes: 2

Answers (2)

Aetherus

Reputation: 8908

I did my benchmark and profile, here is the code:

#!/usr/bin/env ruby
require 'benchmark'
require 'rubygems'
require 'ruby-prof'

def ftoa_1(path)
  empty = []
  File.foreach(path) do |line|
    empty << line.to_s.split('')
  end
  return empty.flatten!
end

def ftoa_2(path)
  File.read(path).split('')
end

def ftoa_3(path)
  File.read(path).chars
end

def ftoa_4(path)
  File.open(path) { |f| f.each_char.to_a }
end

GC.start
GC.disable

Benchmark.bm(6) do |x|
  1.upto(4) do |n|
    x.report("ftoa_#{n}") {send("ftoa_#{n}", ARGV[0])}
  end
end

1.upto(4) do |n|
  puts "\nProfiling ftoa_#{n} ...\n"

  result = RubyProf.profile do
    send("ftoa_#{n}", ARGV[0])
  end

  RubyProf::FlatPrinter.new(result).print($stdout)
end

And here is my result:

             user     system      total        real
ftoa_1   2.090000   0.160000   2.250000 (  2.250350)
ftoa_2   1.540000   0.090000   1.630000 (  1.632173)
ftoa_3   0.420000   0.080000   0.500000 (  0.505286)
ftoa_4   0.550000   0.090000   0.640000 (  0.630003)

Profiling ftoa_1 ...
Measure Mode: wall_time
Thread ID: 70190654290440
Fiber ID: 70189795562220
Total: 2.571306
Sort by: self_time

 %self      total      self      wait     child     calls  name
 83.39      2.144     2.144     0.000     0.000   103930   String#split
 12.52      0.322     0.322     0.000     0.000        1   Array#flatten!
  3.52      2.249     0.090     0.000     2.159        1   <Class::IO>#foreach
  0.57      0.015     0.015     0.000     0.000   103930   String#to_s
  0.00      2.571     0.000     0.000     2.571        1   Global#[No method]
  0.00      2.571     0.000     0.000     2.571        1   Object#ftoa_1
  0.00      0.000     0.000     0.000     0.000        1   Fixnum#to_s

* indicates recursively called methods

Profiling ftoa_2 ...
Measure Mode: wall_time
Thread ID: 70190654290440
Fiber ID: 70189795562220
Total: 1.855242
Sort by: self_time

 %self      total      self      wait     child     calls  name
 99.77      1.851     1.851     0.000     0.000        1   String#split
  0.23      0.004     0.004     0.000     0.000        1   <Class::IO>#read
  0.00      1.855     0.000     0.000     1.855        1   Global#[No method]
  0.00      1.855     0.000     0.000     1.855        1   Object#ftoa_2
  0.00      0.000     0.000     0.000     0.000        1   Fixnum#to_s

* indicates recursively called methods

Profiling ftoa_3 ...
Measure Mode: wall_time
Thread ID: 70190654290440
Fiber ID: 70189795562220
Total: 0.721246
Sort by: self_time

 %self      total      self      wait     child     calls  name
 99.42      0.717     0.717     0.000     0.000        1   String#chars
  0.58      0.004     0.004     0.000     0.000        1   <Class::IO>#read
  0.00      0.721     0.000     0.000     0.721        1   Object#ftoa_3
  0.00      0.721     0.000     0.000     0.721        1   Global#[No method]
  0.00      0.000     0.000     0.000     0.000        1   Fixnum#to_s

* indicates recursively called methods

Profiling ftoa_4 ...
Measure Mode: wall_time
Thread ID: 70190654290440
Fiber ID: 70189795562220
Total: 0.816140
Sort by: self_time

 %self      total      self      wait     child     calls  name
 99.99      0.816     0.816     0.000     0.000        2   IO#each_char
  0.00      0.000     0.000     0.000     0.000        1   File#initialize
  0.00      0.000     0.000     0.000     0.000        1   IO#close
  0.00      0.816     0.000     0.000     0.816        1   <Class::IO>#open
  0.00      0.000     0.000     0.000     0.000        1   IO#closed?
  0.00      0.816     0.000     0.000     0.816        1   Global#[No method]
  0.00      0.816     0.000     0.000     0.816        1   Enumerable#to_a
  0.00      0.816     0.000     0.000     0.816        1   Enumerator#each
  0.00      0.816     0.000     0.000     0.816        1   Object#ftoa_4
  0.00      0.000     0.000     0.000     0.000        1   Fixnum#to_s

* indicates recursively called methods

The conclusion is that ftoa_3 is the fastest when GC is turned off, but I would recommend ftoa_4 because it uses less memory and thus reduces the times of GC. If you turn GC on, you can see ftoa_4 will be the fastest.

From the profile result, you can see the program spends most time in String#split in both ftoa_1 and ftoa_2. The ftoa_1 is the worst because String#split runs many times (1 for each line), and Array.flatten! also takes a lot of time.

Upvotes: 2

spickermann

Reputation: 107142

Yes this can be optimized (written sorter, with less assignments and less method calls):

def file_to_array(file)
  File.read(file).split('')
end

This works, because file is already a string and therefore there is no need for the string interpolation "#{file}". File.read returns the whole file, this removes the need to iterate over each line. Without the iteration there is no need for a temporary empty array, the flatten! and the string concatenation <<. And there is no need for the explizit return in your example.

Update: It is not clear from your question what you are optimizing for: performance, memory usage or readablity. Since I was surprised by your benchmark results I ran my own. And I think my solution is faster than yours.

But the results might differ on different Ruby versions (I used Ruby 2.3), the input file size and number of lines or the number of iterations ran in the benchmark.

def file_to_array_1(file)
  empty = []
  File.foreach("#{file}") do |line|
    empty << line.to_s.split('')
  end
  return empty.flatten!
end

def file_to_array_2(file)
  File.read(file).split('')
end

require 'benchmark'

# file = '...' # a path to a file with about 26KB data in about 750 lines
n = 1000

Benchmark.bmbm(15) do |x|
  x.report("version 1 :")   { n.times do; file_to_array_1(file); end }
  x.report("version 2 :")   { n.times do; file_to_array_2(file); end }
end

# Rehearsal ---------------------------------------------------
# version 1 :      11.970000   0.110000  12.080000 ( 12.092841)
# version 2 :       8.150000   0.120000   8.270000 (  8.267420)
# ----------------------------------------- total: 20.350000sec

#                       user     system      total        real
# version 1 :      11.940000   0.100000  12.040000 ( 12.045505)
# version 2 :       8.130000   0.110000   8.240000 (  8.248707)
# [Finished in 40.7s]

Upvotes: 0

Optimizing Ruby Arrays or Hashes

Answers (2)

Related Questions