Reputation: 766

Chaining method blocks (Ruby)

Considering the following code:

    lines = Array.new() 
    File.foreach('file.csv').with_index do |line, line_num|                 
      lines.push(line.split(" ")) if line_num > 0                                 
    end                                                                                  

    indices = lines.map { |el| el.last }                                          
    duplicates = indices.select{ |e| indices.count(e) > 2 }.uniq

Example CSV file looks like this for all who wonder:

# Generated by tool XYZ
a b c 1
d e f 2
g h i 1
j k l 4
m n o 5
p q r 2
s t u 2
v w x 1
y z 0 5

Is it possible to chain these two methods blocks (last two lines of code) together?

Upvotes: 0

Answers (3)

Cary Swoveland

Reputation: 110665

Example

Let's apply your code to an example.

str =<<-END
Now is the
time for all
people who are
known to all
of us as the
best coders are
expected to
lead all
those who are
less experienced
to greatness
END

FName = 'temp'
File.write(FName, str)
  #=> 146

Your code

lines = Array.new() 
File.foreach(FName).with_index do |line, line_num|                 
  lines.push(line.split(" ")) if line_num > 0                                 
end                                                                                  
lines
  #=> [["time", "for", "all"], ["people", "who", "are"], ["known", "to", "all"],
  #    ["of", "us", "as", "the"], ["best", "coders", "are"], ["expected", "to"],
  #    ["lead", "all"], ["those", "who", "are"], ["less", "experienced"],
  #    ["to", "greatness"]] 
indices = lines.map { |el| el.last }                                          
  #=> ["all", "are", "all", "the", "are", "to", "all", "are", "experienced", "greatness"] 
duplicates = indices.select { |e| indices.count(e) > 2 }
  #=> ["all", "are", "all", "are", "all", "are"] 
duplicates.uniq
  #=> ["all", "are"]

The object is seen to return an array of all words that appear as the last word of a line (other than the first line) more than twice.

More Ruby-like and more efficient code

We can do that more concisely and efficiently by making a single pass through the file:

first_line = true
h = Hash.new(0)
File.foreach(FName) do |line|
  if first_line
    first_line = false
  else
    h[line[/\S+(?=\n)/]] += 1
  end
end
h.select { |_,count| count > 2 }.keys
  #=> ["all", "are"]

Steps performed

The steps are as follows.

first_line = true
h = Hash.new(0)
File.foreach(FName) do |line|
  if first_line
    first_line = false
  else
    h[line[/\S+(?=\n)/]] += 1
  end
end
h #=> {"all"=>3, "are"=>3, "the"=>1, "to"=>1, "experienced"=>1, "greatness"=>1}
g = h.select { |_,count| count > 2 }
  #=> {"all"=>3, "are"=>3} 
g.keys
  #=> ["all", "are"]

Use of Enumerator#each_object

Rather than defining the hash before File.foreach(..) is executed, its customary to use the method Enumerator#each_object, which allows us to chain the hash that is constructed to that statements that follow:

first_line = true
File.foreach(FName).with_object(Hash.new(0)) do |line, h|
  if first_line
    first_line = false
  else
    h[line[/\S+(?=\n)/]] += 1
  end
end.select { |_,count| count > 2 }.keys
  #=> ["all", "are"]

Use of a counting hash

I define the hash as follows.

h = Hash.new(0)

This uses the form of Hash::new that defines a default value equal to news argument. If h = Hash.new(0) and h does not have a key k, h[k] returns the default value, zero. Ruby's parser expands the expression h[k] += 1 to:

h[k] = h[k] + 1

If h does not have a key k, the expression becomes

h[k] = 0 + 1

Note that h[k] = h[k] + 1 is shorthand for:

h.[]=(k, h.[](k) + 1)

It is the method Hash#[] that defaults to zero, not the method Hash#[]=.

Using a regular expression to extract the last word of each line

One of the lines is

str = "known to all\n"

We can use the regular expression r = /\S+(?=\n)/ to extract the last word:

str[r] #=> "all"

The regular expression reads, "match one or more (+) characters that are not whitespace characters (\S), immediately followed by a newline character. (?=\n) is a positive lookahead. "\n" must be matched by it is not part of the match returned.

Upvotes: 0

Aleksei Matiushkin

Reputation: 121000

O(N) solution (single pass) would look like:

lines.each_with_object([[], []]) do |el, (result, temp)|
  (temp.delete(el) ? result : temp) << el
end.first

Here we use an intermediate

Also, you always might use Object#tap:

duplicates =
  lines.map(&:last).tap do |indices|
    indices.select { |e| indices.count(e) > 2 }.uniq
  end

Upvotes: 1

Babar Al-Amin

Reputation: 3984

If you don't want to have a intermediate variable and want to do it in a single line, you can write something like this:

duplicates = lines.group_by(&:last).select{|k, v| v.count > 2}.keys

For some people, this might hinder readability though! Just depends on your taste.

Upvotes: 2

Chaining method blocks (Ruby)

Answers (3)

Related Questions