Reputation: 766
Considering the following code:
lines = Array.new()
File.foreach('file.csv').with_index do |line, line_num|
lines.push(line.split(" ")) if line_num > 0
end
indices = lines.map { |el| el.last }
duplicates = indices.select{ |e| indices.count(e) > 2 }.uniq
Example CSV file looks like this for all who wonder:
# Generated by tool XYZ
a b c 1
d e f 2
g h i 1
j k l 4
m n o 5
p q r 2
s t u 2
v w x 1
y z 0 5
Is it possible to chain these two methods blocks (last two lines of code) together?
Upvotes: 0
Views: 838
Reputation: 110665
Example
Let's apply your code to an example.
str =<<-END
Now is the
time for all
people who are
known to all
of us as the
best coders are
expected to
lead all
those who are
less experienced
to greatness
END
FName = 'temp'
File.write(FName, str)
#=> 146
Your code
lines = Array.new()
File.foreach(FName).with_index do |line, line_num|
lines.push(line.split(" ")) if line_num > 0
end
lines
#=> [["time", "for", "all"], ["people", "who", "are"], ["known", "to", "all"],
# ["of", "us", "as", "the"], ["best", "coders", "are"], ["expected", "to"],
# ["lead", "all"], ["those", "who", "are"], ["less", "experienced"],
# ["to", "greatness"]]
indices = lines.map { |el| el.last }
#=> ["all", "are", "all", "the", "are", "to", "all", "are", "experienced", "greatness"]
duplicates = indices.select { |e| indices.count(e) > 2 }
#=> ["all", "are", "all", "are", "all", "are"]
duplicates.uniq
#=> ["all", "are"]
The object is seen to return an array of all words that appear as the last word of a line (other than the first line) more than twice.
More Ruby-like and more efficient code
We can do that more concisely and efficiently by making a single pass through the file:
first_line = true
h = Hash.new(0)
File.foreach(FName) do |line|
if first_line
first_line = false
else
h[line[/\S+(?=\n)/]] += 1
end
end
h.select { |_,count| count > 2 }.keys
#=> ["all", "are"]
Steps performed
The steps are as follows.
first_line = true
h = Hash.new(0)
File.foreach(FName) do |line|
if first_line
first_line = false
else
h[line[/\S+(?=\n)/]] += 1
end
end
h #=> {"all"=>3, "are"=>3, "the"=>1, "to"=>1, "experienced"=>1, "greatness"=>1}
g = h.select { |_,count| count > 2 }
#=> {"all"=>3, "are"=>3}
g.keys
#=> ["all", "are"]
Use of Enumerator#each_object
Rather than defining the hash before File.foreach(..)
is executed, its customary to use the method Enumerator#each_object
, which allows us to chain the hash that is constructed to that statements that follow:
first_line = true
File.foreach(FName).with_object(Hash.new(0)) do |line, h|
if first_line
first_line = false
else
h[line[/\S+(?=\n)/]] += 1
end
end.select { |_,count| count > 2 }.keys
#=> ["all", "are"]
Use of a counting hash
I define the hash as follows.
h = Hash.new(0)
This uses the form of Hash::new that defines a default value equal to new
s argument. If h = Hash.new(0)
and h
does not have a key k
, h[k]
returns the default value, zero. Ruby's parser expands the expression h[k] += 1
to:
h[k] = h[k] + 1
If h
does not have a key k
, the expression becomes
h[k] = 0 + 1
Note that h[k] = h[k] + 1
is shorthand for:
h.[]=(k, h.[](k) + 1)
It is the method Hash#[]
that defaults to zero, not the method Hash#[]=
.
Using a regular expression to extract the last word of each line
One of the lines is
str = "known to all\n"
We can use the regular expression r = /\S+(?=\n)/
to extract the last word:
str[r] #=> "all"
The regular expression reads, "match one or more (+
) characters that are not whitespace characters (\S
), immediately followed by a newline character. (?=\n)
is a positive lookahead. "\n"
must be matched by it is not part of the match returned.
Upvotes: 0
Reputation: 121000
O(N)
solution (single pass) would look like:
lines.each_with_object([[], []]) do |el, (result, temp)|
(temp.delete(el) ? result : temp) << el
end.first
Here we use an intermediate
Also, you always might use Object#tap
:
duplicates =
lines.map(&:last).tap do |indices|
indices.select { |e| indices.count(e) > 2 }.uniq
end
Upvotes: 1
Reputation: 3984
If you don't want to have a intermediate variable and want to do it in a single line, you can write something like this:
duplicates = lines.group_by(&:last).select{|k, v| v.count > 2}.keys
For some people, this might hinder readability though! Just depends on your taste.
Upvotes: 2