changingrainbows
changingrainbows

Reputation: 2711

Filter an array of strings based on contents of array itself

How can I remove all substrings of another string within an array of strings? I want this array of strings:

arr = ["Bochum", "Stu", "Stut", "Stuttt", "Stutt", "Stuttgart", "Heesestr.", "Berl", "Berlin"]

to shrink to:

["Bochum", "Stuttt", "Stuttgart", "Heesestr.", "Berlin"]

Edit:

Upvotes: 1

Views: 772

Answers (6)

William Hu
William Hu

Reputation: 16189

Find the sub-strings and remove them, might be not good but clear

ar = ["Bochum", "Stu", "Stut", "Stuttt", "Stutt", "Stuttgart", "Heesestr.", "Berl", "Berlin"] 
sub_strings = []
ar.collect do |string|
  for index in 0...string.length
    sub_strings << string[0...index] if ar.include?(string[0...index]) 
  end
end
results = ar - sub_strings

Upvotes: 0

jtbandes
jtbandes

Reputation: 118781

Here's an implementation using a Trie-like data structure. It achieves the goal by simply losing information :-)

(I've assumed you only care about strings being prefixes of each other, rather than substrings...)

class LossyTrie
  def initialize; @dict = {}; end

  def add(str)
    # Break the new string apart into characters, traversing down the trie at each step.
    # As a side effect, if a prefix of str was already present, it will be forgotten.
    # Similarly, if str itself is a prefix of an existing string, nothing will change.
    dict = @dict
    str.each_char do |c|
      dict = (dict[c] ||= {})
    end
  end

  def all_strings
    strs = []
    def traverse(dict, so_far, &block)
      for k, v in dict
        if v.empty?
          block.call(so_far + k)
        else
          traverse(v, so_far + k, &block)
        end
      end
    end
    traverse(@dict, "") { |leaf| strs << leaf }
    strs
  end
end

strs = ["Bochum", "Stu", "Stut", "Stuttt", "Stutt", "Stuttgart", "Heesestr.", "Berl", "Berlin"]

trie = LossyTrie.new
strs.each { |s| trie.add(s) }

trie.all_strings # => ["Bochum", "Berlin", "Stuttt", "Stuttgart", "Heesestr."]

Upvotes: 0

Yevgeniy Goyfman
Yevgeniy Goyfman

Reputation: 492

No need for Rails, plain Ruby will do:

my_array =["Bochum", "Stu", "Stut", "Stuttt", "Stutt", "Stuttgart", "Heesestr.", "Berl", "Berlin"]

my_array.keep_if {|x| my_array.join(",").scan(x).length == 1}

Upvotes: 0

Todd A. Jacobs
Todd A. Jacobs

Reputation: 84453

One-Liner with Sort, Grep, and Count

Assuming your array elements always start with the same letters, one way to remove substrings is to sort, which will place shorter elements first. You can then reject elements that have longer matches deeper into the array. For example:

array = %w[Bochum Stu Stut Stuttt Stutt Stuttgart Heesestr. Berl Berlin]
array.sort.reject { |elem| array.grep(/\A#{elem}/).count > 1 }
#=> ["Berlin", "Bochum", "Heesestr.", "Stuttgart", "Stuttt"]

If your array shouldn't be sorted, then this is not the right solution for you. However, it definitely contains the right array elements, and is both short and easy to read. Your mileage may vary.

Upvotes: 0

Aetherus
Aetherus

Reputation: 8898

A solution that does not preserve the order:

["Bochum", "Stu", "Stut", "Stuttt", "Stutt", "Stuttgart",
   "Heesestr.", "Berlin", "Berl"].sort_by(&:size).reduce([]) do |ary, word|
  ary.reject{|s| word.include?(s)}.push(word)
end

Upvotes: 0

Cary Swoveland
Cary Swoveland

Reputation: 110755

If you're not opposed to the use of brute force:

arr = ["Bochum", "Stu", "Stut", "Stuttt", "Stutt", "Stuttgart",
       "Heesestr.", "Berl", "Berlin"]

arr.each_with_object([]) { |str,a|
  a << str unless arr.any? { |s| s.include?(str) && s.size > str.size } }
  #=> ["Bochum", "Stuttt", "Stuttgart", "Heesestr.", "Berlin"] 

Upvotes: 2

Related Questions