RD Ward
RD Ward

Reputation: 6737

How can I use real time monitoring (tail -f), cut, sort, and uniq together in Unix?

I am trying to delete duplicate text that's being written into a log, while continuously monitoring it.

The only issue is that this particular log is timestamped, so before it's possible to determine if the same text is written twice or three times in a row, the timestamp must be cut.

I'm not Unix expert, but this is my attempt:

tail -f log.txt | cut -c 28- | sort | uniq

The terminal behaves unexpectedly, and just hangs. Whereas either of the two following commands work on their own:

tail -f log.txt | cut -c 28-

or

tail -f log.txt | uniq

Ideally I'd like to filter out non-adjacent text entries, i.e. I would like to be able to use sort as well, but currently I can't get it to work with the -f flag on tail.

Upvotes: 1

Views: 1109

Answers (1)

Blckknght
Blckknght

Reputation: 104722

You can't get sorted output of a stream of text before it has ended, as the next item to come in might belong ahead of the first one you've seen so far. This makes the sort | unique part of your pipeline not useful for your situation.

While it's probably possible to to filter out your duplicates with some more complicated shell scripting, you might find it easier to write a script in some other language. Many scripting languages have efficient set data structures that can quickly check if an item has been seen before. Here's a fairly trivial script that should do the job using Python 3:

#!/usr/bin/env python3

import sys

seen = set()
for line in sys.stdin:
    if line not in seen:
        sys.stdout.write(line)
        seen.add(line)

The downside to this approach is that the filtering script will use much more memory than uniq does, since it must remember every unique line it has seen before. So, this might not be an appropriate solution if your pipeline may see a great many different lines in a single run.

Upvotes: 3

Related Questions