djphillovesyou
djphillovesyou

Reputation: 5

Ruby Script that doesn't print duplicate lines

What can i add to this script that will cause it to not print out the duplicate lines from the txt file

The Script is

class TestKeyword

        file = File.new("test.txt", "r")
    while (line = file.gets)
        if line['MAY_DAY']
            date = line[/\w+ +\d+ +\d+:\d+:\d+/]
            puts "#{date}"

        end        
    end
end

this is the test file:

Oct 15 12:54:01 WHERE IS THE LOVIN MAY_DAY
Oct 16 23:15:44 WHAT THE HECK CAN I DO ABOUT IT HUMP_DAY 
Oct 16 14:16:09 I LOVE MY BABY GIRL MAY_DAY 
Oct 16 08:25:18 CAN WAIT UNTIL MY BABY RECOVERS CRYSTAL_WIFE 
Oct 18 17:48:38 I HOPE HE STOP MESSING WITH THESE FOOLISH CHILDREN TONY_SMITH 
Oct 19 05:17:58 GAME TIME GO HEAD AND GET ME MAY_DAY 
Oct 20 10:23:33 GAMESTOP IS WHERE ITS AT GAME_DAY
Oct 21 03:54:27 WHAT IS GOING ON WITH MY LUNCH HUNGRY_MAN
Oct 15 12:54:01 WHERE IS THE LOVIN MAY_DAY
Oct 16 23:15:44 WHAT THE HECK CAN I DO ABOUT IT HUMP_DAY 
Oct 16 14:16:09 I LOVE MY BABY GIRL MAY_DAY 
Oct 16 08:25:18 CAN WAIT UNTIL MY BABY RECOVERS CRYSTAL_WIFE 
Oct 18 17:48:38 I HOPE HE STOP MESSING WITH THESE FOOLISH CHILDREN TONY_SMITH 
Oct 19 05:17:58 GAME TIME GO HEAD AND GET ME MAY_DAY 
Oct 20 10:23:33 GAMESTOP IS WHERE ITS AT GAME_DAY
Oct 21 03:54:27 WHAT IS GOING ON WITH MY LUNCH HUNGRY_MAN

Currently when i execute the script i get the following(which is the date and time of the lines that have the keyword "MAY_DAY":

1: Oct 15 12:54:01
1: Oct 16 14:16:09
1: Oct 19 05:17:58
1: Oct 15 12:54:01
1: Oct 16 14:16:09
1: Oct 19 05:17:58

The output i need is:

1: Oct 15 12:54:01
1: Oct 16 14:16:09
1: Oct 19 05:17:58

Which doesn't have the duplicates

Upvotes: 0

Views: 63

Answers (2)

struthersneil
struthersneil

Reputation: 2750

You're going to have to remember what lines you have already output with a little array, e.g.

class TestKeyword
  found = []
  file = File.new("test.txt", "r")
  while (line = file.gets)
    if line['MAY_DAY']
      date = line[/\w+ +\d+ +\d+:\d+:\d+/]
      if !found.include? date
        found << date 
        puts "#{counter}: #{date}"
      end
    end        
  end
end

See what I'm doing there? If the date isn't in the array, we add it to it and output the date. Otherwise we ignore it.

Edit: if you want to be a bit more advanced you can use a Set rather than an array. Sets are designed for fast lookup of unique elements. If the only question you want to ask is 'is this element in this set?' and you don't care about order, use a Set. To do that, just change this line:

found = []

To this:

found = Set.new

Upvotes: 1

Mark Thomas
Mark Thomas

Reputation: 37517

If the file isn't huge, this will print out the unique lines that match:

file.readlines.select{|l| l.include? "MAY_DAY"}.uniq

It doesn't apply the counter, but that is easily added.

Upvotes: 1

Related Questions