Reputation: 5
What can i add to this script that will cause it to not print out the duplicate lines from the txt file
The Script is
class TestKeyword
file = File.new("test.txt", "r")
while (line = file.gets)
if line['MAY_DAY']
date = line[/\w+ +\d+ +\d+:\d+:\d+/]
puts "#{date}"
end
end
end
this is the test file:
Oct 15 12:54:01 WHERE IS THE LOVIN MAY_DAY
Oct 16 23:15:44 WHAT THE HECK CAN I DO ABOUT IT HUMP_DAY
Oct 16 14:16:09 I LOVE MY BABY GIRL MAY_DAY
Oct 16 08:25:18 CAN WAIT UNTIL MY BABY RECOVERS CRYSTAL_WIFE
Oct 18 17:48:38 I HOPE HE STOP MESSING WITH THESE FOOLISH CHILDREN TONY_SMITH
Oct 19 05:17:58 GAME TIME GO HEAD AND GET ME MAY_DAY
Oct 20 10:23:33 GAMESTOP IS WHERE ITS AT GAME_DAY
Oct 21 03:54:27 WHAT IS GOING ON WITH MY LUNCH HUNGRY_MAN
Oct 15 12:54:01 WHERE IS THE LOVIN MAY_DAY
Oct 16 23:15:44 WHAT THE HECK CAN I DO ABOUT IT HUMP_DAY
Oct 16 14:16:09 I LOVE MY BABY GIRL MAY_DAY
Oct 16 08:25:18 CAN WAIT UNTIL MY BABY RECOVERS CRYSTAL_WIFE
Oct 18 17:48:38 I HOPE HE STOP MESSING WITH THESE FOOLISH CHILDREN TONY_SMITH
Oct 19 05:17:58 GAME TIME GO HEAD AND GET ME MAY_DAY
Oct 20 10:23:33 GAMESTOP IS WHERE ITS AT GAME_DAY
Oct 21 03:54:27 WHAT IS GOING ON WITH MY LUNCH HUNGRY_MAN
Currently when i execute the script i get the following(which is the date and time of the lines that have the keyword "MAY_DAY":
1: Oct 15 12:54:01
1: Oct 16 14:16:09
1: Oct 19 05:17:58
1: Oct 15 12:54:01
1: Oct 16 14:16:09
1: Oct 19 05:17:58
The output i need is:
1: Oct 15 12:54:01
1: Oct 16 14:16:09
1: Oct 19 05:17:58
Which doesn't have the duplicates
Upvotes: 0
Views: 63
Reputation: 2750
You're going to have to remember what lines you have already output with a little array, e.g.
class TestKeyword
found = []
file = File.new("test.txt", "r")
while (line = file.gets)
if line['MAY_DAY']
date = line[/\w+ +\d+ +\d+:\d+:\d+/]
if !found.include? date
found << date
puts "#{counter}: #{date}"
end
end
end
end
See what I'm doing there? If the date isn't in the array, we add it to it and output the date. Otherwise we ignore it.
Edit: if you want to be a bit more advanced you can use a Set
rather than an array. Sets are designed for fast lookup of unique elements. If the only question you want to ask is 'is this element in this set?' and you don't care about order, use a Set
. To do that, just change this line:
found = []
To this:
found = Set.new
Upvotes: 1
Reputation: 37517
If the file isn't huge, this will print out the unique lines that match:
file.readlines.select{|l| l.include? "MAY_DAY"}.uniq
It doesn't apply the counter, but that is easily added.
Upvotes: 1