Stefan
Stefan

Reputation: 114268

How to efficiently concatenate multiple arrays in Ruby?

I just wanted to concatenate multiple arrays in Ruby and couldn't find a satisfying way to do so.

Example input:

foo = [1, 2, 3]
bar = [4, 5, 6]
baz = [7, 8, 9]

Expected result: (without modifying the existing arrays)

[1, 2, 3, 4, 5, 6, 7, 8, 9]

My actual arrays are much larger, so I'm interested in an efficient solution. There may also be more than three arrays, so a short syntax is preferred.

What I have tried so far

Shouldn't there be a simple class method for such a basic task? Something like:

Array.concat(foo, bar, baz)

Am I missing something obvious?

Upvotes: 20

Views: 17543

Answers (3)

Stefan
Stefan

Reputation: 114268

I've created another benchmark, comparing +, concat and a custom C extension with a variable number of arrays.

Result

  • the C extension was always fastest and roughly 2-3x faster than concat
  • plus is getting really slow if you concatenate many arrays

Conclusion

Although "2-3x" sounds like a huge improvement, it's just a few milliseconds in absolute terms. I was expecting a bigger difference by not having to resize the array, but this is apparently not a huge factor.

IMO, concat is a decent performer and I see no urgent need for a C extension.


My test arrays contain nil values. Other elements don't seem to produce different results (in relative terms).

I didn't include flat_map, because it is equivalent to concat.

Concatenating 3 arrays of size 100 (10000 times)
                 user     system      total        real
plus         0.020000   0.000000   0.020000 (  0.027927)
concat       0.020000   0.010000   0.030000 (  0.033204)
c_extension  0.010000   0.010000   0.020000 (  0.010727)

Concatenating 10 arrays of size 100 (10000 times)
                 user     system      total        real
plus         0.110000   0.070000   0.180000 (  0.180417)
concat       0.050000   0.020000   0.070000 (  0.065299)
c_extension  0.010000   0.010000   0.020000 (  0.025475)

Concatenating 10 arrays of size 1000 (10000 times)
                 user     system      total        real
plus         0.690000   0.560000   1.250000 (  1.252319)
concat       0.180000   0.130000   0.310000 (  0.303365)
c_extension  0.120000   0.120000   0.240000 (  0.248589)

plus is excluded from the following results

Concatenating 10 arrays of size 100000 (100 times)
                 user     system      total        real
concat       0.220000   0.340000   0.560000 (  0.568730)
c_extension  0.130000   0.150000   0.280000 (  0.281354)

Concatenating 100 arrays of size 10000 (100 times)
                 user     system      total        real
concat       0.210000   0.320000   0.530000 (  0.519030)
c_extension  0.160000   0.140000   0.300000 (  0.304751)

Concatenating 1000 arrays of size 1000 (100 times)
                 user     system      total        real
concat       0.240000   0.330000   0.570000 (  0.563511)
c_extension  0.150000   0.120000   0.270000 (  0.283546)

Concatenating 10000 arrays of size 100 (100 times)
                 user     system      total        real
concat       0.330000   0.310000   0.640000 (  0.643987)
c_extension  0.170000   0.120000   0.290000 (  0.286489)

Concatenating 100000 arrays of size 10 (100 times)
                 user     system      total        real
concat       1.300000   0.340000   1.640000 (  1.648687)
c_extension  0.310000   0.150000   0.460000 (  0.458214)

Test code:

require 'benchmark'

values = [
  # small
  { count: 3,      size: 100,    n: 10000 },
  { count: 10,     size: 100,    n: 10000 },
  { count: 10,     size: 1000,   n: 10000 },
  # large
  { count: 10,      size: 100000, n: 100 },
  { count: 100,     size: 10000,  n: 100 },
  { count: 1000,    size: 1000,   n: 100 },
  { count: 10000,   size: 100,    n: 100 },
  { count: 100000,  size: 10,     n: 100 }
]

values.each_with_index do |h, i|
  count, size, n = h.values_at(:count, :size, :n)
  arrays = Array.new(count) { Array.new(size) }

  puts
  puts "Concatenating #{count} arrays of size #{size} (#{n} times)"
  Benchmark.bm(10) do |r|
    r.report('plus')        { n.times { arrays.reduce(:+) } } if i < 3
    r.report('concat')      { n.times { arrays.reduce([], :concat) } }
    r.report('c_extension') { n.times { Array.concat(*arrays) } }
  end
end

C extension: (a patch actually, I've added this to Ruby's array.c)

VALUE
rb_ary_s_concat(int argc, VALUE *argv, VALUE klass)
{
  VALUE ary;
  long len = 0, i;
  for (i=0; i<argc; i++) {
    argv[i] = to_ary(argv[i]);
    len += RARRAY_LEN(argv[i]);
  }
  ary = rb_ary_new2(len);
  long beg = 0;
  for (i=0; i<argc; i++) {
    ary_memcpy(ary, beg, RARRAY_LEN(argv[i]), RARRAY_CONST_PTR(argv[i]));
    beg += RARRAY_LEN(argv[i]);
  }
  ARY_SET_LEN(ary, len);
  return ary;
}

You have to register this method in Init_Array via:

rb_define_singleton_method(rb_cArray, "concat", rb_ary_s_concat, -1);

Upvotes: 11

peter
peter

Reputation: 42207

Did some benchmarks and simple + is the most efficient. So i would suggest to neglect the intermediate creation of an array.

You could add a new method concat_all to Array like this, but you would have to take into account mixed and multi-dimensional arrays also.

class Array
  def concat_all 
    self.reduce([], :+)
  end
end
[a, b, c].concat_all # a huge array
[a, b, c].concat_all.length #300000

Here my benchmarks

require 'Benchmark'
N = 1000

class Array
  def concat_all 
    self.reduce([], :+)
  end
  def concat_all2
    # just a quick test with fixed numbers for the fill method Stephan proposes but in Ruby itself
    d = Array.new(300_000)
    d[0..99999] = self[0]
    d[100_000..199999] = self[1]
    d[200_000..299999] = self[2]
    d
  end
  def concat_all3
    self.flatten
  end
end

# small arrays
a = (1..10).to_a
b = (11..20).to_a
c = (21..30).to_a

Benchmark.bm do |r|
  r.report('plus       ')  { N.times { a + b + c }}
  r.report('concat     ') { N.times { [].concat(a).concat(b).concat(c) }}
  r.report('push       ') { N.times { [].push(*a).push(*b).push(*c) }}
  r.report('<<         ') { N.times { ([] << a << b << c).flatten}}
  r.report('splash     ') { N.times {[*a, *b, *c]} }
  r.report('concat_all ')  { N.times { [a, b, c].concat_all }}
  r.report('concat_all3')  { N.times { [a, b, c].concat_all3 }}
  r.report('flat_map   ') { N.times {[a, b, c].flat_map(&:itself)} }
end

#large arrays
a = (1..100_000).to_a
b = (100_001..200_000).to_a
c = (200_001..300_000).to_a

Benchmark.bm do |r|
  r.report('plus       ')  { N.times { a + b + c }}
  r.report('concat     ') { N.times { [].concat(a).concat(b).concat(c) }}
  r.report('push       ') { N.times { [].push(*a).push(*b).push(*c) }}
  r.report('<<         ') { N.times { ([] << a << b << c).flatten}}
  r.report('splash     ') { N.times {[*a, *b, *c]} }
  r.report('concat_all ')  { N.times { [a, b, c].concat_all }}
  r.report('concat_all2')  { N.times { [a, b, c].concat_all2 }}
  r.report('concat_all3')  { N.times { [a, b, c].concat_all3 }}
  r.report('flat_map   ') { N.times {[a, b, c].flat_map(&:itself)} }
end

And here the results

# results for small arrays
       user     system      total        real
plus         0.000000   0.000000   0.000000 (  0.000416)
concat       0.000000   0.000000   0.000000 (  0.000592)
push         0.000000   0.000000   0.000000 (  0.000441)
<<           0.000000   0.000000   0.000000 (  0.003387)
splash       0.000000   0.000000   0.000000 (  0.000789)
concat_all   0.000000   0.000000   0.000000 (  0.001480)
concat_all3  0.016000   0.000000   0.016000 (  0.003496)
flat_map     0.000000   0.000000   0.000000 (  0.001036)

# results for big arrays
       user     system      total        real
plus         0.686000   0.671000   1.357000 (  1.351171)
concat       0.890000   0.733000   1.623000 (  1.630155)
push         1.466000   0.624000   2.090000 (  2.092684)
<<          23.837000   1.045000  24.882000 ( 24.885238)
splash       1.029000   1.264000   2.293000 (  2.332560)
concat_all   0.687000   0.967000   1.654000 (  1.709321)
concat_all2  0.936000   0.780000   1.716000 (  1.730428)
concat_all3 24.242000   0.998000  25.240000 ( 25.278264)
flat_map     0.780000   0.765000   1.545000 (  1.551654)

Upvotes: 4

Max
Max

Reputation: 22385

If you've already determined that multiple concatenation is the fastest method, you can write it nicer using reduce:

[foo, bar, baz].reduce([], :concat)

Upvotes: 34

Related Questions