Reputation: 18097
I have two arrays, one with data and one with indexes. I want to know if there are some good ways to delete the elements in data
at the position given in indexes
. I could do simple iteration but I am wondering what the shortest way is:
data = ['a','b','c','a','b','c','a','b','c']
indexes = [2,5,8]
//some code here
Elements in data
are gone when the indexes happened to coincide with numbers in array indexes. It should look like this:
['a','b','a','b','a','b']
Upvotes: 6
Views: 108
Reputation: 160551
Doing it without iteration might seem like a good goal, but iteration done right is going to be extremely fast.
Benchmarks are important:
require 'benchmark'
DATA = ['a','b','c','a','b','c','a','b','c']
INDEXES = [2,5,8]
def ttm(data)
d2 = data.dup
INDEXES.sort.reverse.each{ |i| d2.delete_at(i) }
d2
end
def devon_parsons(data)
new_data = data.each_with_index.reject do |value,index|
INDEXES.include? index
end.map(&:first)
new_data
end
def arup_rakshit(data)
data.values_at(*(0...data.size).to_a - INDEXES)
end
def sawa(data)
data.values_at(*data.each_index.to_a - INDEXES)
end
Make sure it's an apples to apples test:
ttm(DATA) # => ["a", "b", "a", "b", "a", "b"]
devon_parsons(DATA) # => ["a", "b", "a", "b", "a", "b"]
arup_rakshit(DATA) # => ["a", "b", "a", "b", "a", "b"]
sawa(DATA) # => ["a", "b", "a", "b", "a", "b"]
Run the benchmarks:
n = 100_000
Benchmark.bm(13) do |b|
b.report('ttm:') { n.times { ttm(DATA) } }
b.report('devon_parsons') { n.times { devon_parsons(DATA) } }
b.report('arup_rakshit') { n.times { arup_rakshit(DATA) } }
b.report('sawa') { n.times { sawa(DATA) } }
end
Which results in:
# >> user system total real
# >> ttm: 0.130000 0.000000 0.130000 ( 0.127559)
# >> devon_parsons 0.530000 0.000000 0.530000 ( 0.535929)
# >> arup_rakshit 0.250000 0.000000 0.250000 ( 0.255295)
# >> sawa 0.300000 0.010000 0.310000 ( 0.305376)
If the data size grows:
DATA2 = DATA * 100
Benchmark.bm(13) do |b|
b.report('ttm:') { n.times { ttm(DATA2) } }
b.report('devon_parsons') { n.times { devon_parsons(DATA2) } }
b.report('arup_rakshit') { n.times { arup_rakshit(DATA2) } }
b.report('sawa') { n.times { sawa(DATA2) } }
end
The results really change:
# >> user system total real
# >> ttm: 0.320000 0.090000 0.410000 ( 0.420074)
# >> devon_parsons 39.170000 0.080000 39.250000 ( 39.265062)
# >> arup_rakshit 9.950000 0.010000 9.960000 ( 9.975699)
# >> sawa 9.940000 0.020000 9.960000 ( 9.959036)
It's really important to test what happens as the array size changes. What might run quickly on a small array can slow dramatically as the array grows. And, too often, what seems like a cool way to do something turns out to be very slow because there are hidden costs. Benchmarks help us figure these things out.
Note: Using sort.reverse
is very important. Without those the array will be mangled.
sort can further be improved to sort_by(&:itself)
require 'benchmark'
array = (0..99).to_a.shuffle
n = 100_000
Benchmark.bm(7) do |b|
b.report('sort:') { n.times { array.sort } }
b.report('sort_by:') { n.times { array.sort_by(&:itself) } }
end
Resulting in:
user system total real
sort: 0.460000 0.010000 0.470000 ( 0.480236)
sort_by: 3.600000 0.030000 3.630000 ( 3.627871)
Increasing the array size:
array = (0..999).to_a.shuffle
Benchmark.bm(13) do |b|
b.report('sort:') { n.times { array.sort } }
b.report('sort_by:') { n.times { array.sort_by(&:itself) } }
end
Resulting in:
user system total real
sort: 9.520000 0.120000 9.640000 ( 9.659246)
sort_by: 53.530000 0.720000 54.250000 ( 54.321285)
Upvotes: 4
Reputation: 18762
Here is my solution:
data = ['a','b','c','a','b','c','a','b','c']
indexes = [2,5,8]
updated_data = data.dup
indexes.each { |i| updated_data[i] = nil}
updated_data.compact!
p updated_data # Prints ["a", "b", "a", "b", "a", "b"]
As far as benchmark goes, using the Tin Man's code, it seems to perform the best. Not sure whether it has something to do with small size of indexes
array.
user system total real
ttm: 0.125000 0.000000 0.125000 ( 0.113075)
devon_parsons 0.484000 0.000000 0.484000 ( 0.491327)
arup_rakshit 0.219000 0.000000 0.219000 ( 0.221149)
sawa 0.250000 0.000000 0.250000 ( 0.253168)
wandmaker 0.094000 0.016000 0.110000 ( 0.095063)
# Run 2 with larger data
user system total real
ttm: 0.422000 0.188000 0.610000 ( 0.596413)
devon_parsons 39.328000 0.000000 39.328000 ( 39.489394)
arup_rakshit 10.078000 0.562000 10.640000 ( 10.644099)
sawa 10.219000 0.110000 10.329000 ( 10.328250)
wandmaker 0.359000 0.062000 0.421000 ( 0.423282)
Upvotes: 0
Reputation: 1288
new_data = data.each_with_index.reject do |value,index|
indexes.include? index
end.map(&:first)
New answer that actually works this time - it runs in O(n^2) and I don't see a way of doing it without iterating over indexes at all.
Upvotes: 1
Reputation: 118271
I will do as below:
data = ['a','b','c','a','b','c','a','b','c']
indexes = [2,5,8]
data.values_at(*(0...data.size).to_a - indexes)
# => ["a", "b", "a", "b", "a", "b"]
Upvotes: 4
Reputation: 168101
data.values_at(*data.each_index.to_a - indexes)
# => ["a", "b", "a", "b", "a", "b"]
Upvotes: 5