sadiq.ali
sadiq.ali

Reputation: 546

Print first occurrence of each unique regex match with line number

Given a regex, I want to print first occurrence of each unique match with its line number using bash.

For example, lets say the regex is .*Exception, I want to print,

$./script.sh file.log
6255:2016-09-07 10:05:37,886 ERROR some text java.lang.IllegalMonitorStateException
6714:2016-09-07 10:12:09,514 ERROR some text java.lang.NullPointerException
7013:2016-09-07 10:19:19,950 ERROR some text java.lang.IllegalStateException

I came up with a version, but it is very slow :( (on git-bash). Any pointers on how to increase performance is appreciated.

FILE_NAME=$1

while read line
do
    grep "$line" "$FILE_NAME" -m1 -n
done < <(grep '\b[^ ]*Exception\b' "$FILE_NAME" | sort -u) | sort -n

Update (adding sample data):

2016-09-07 23:58:55,674 ERROR [STDERR] (pool-18-thread-1) Continuing ...
2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) java.lang.InstantiationException: java.sql.Timestamp
2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) Continuing ...
2016-09-07 23:56:16,273 WARN  [com.arjuna.ats.jta.logging.loggerI18N] (Thread-12) [com.arjuna.ats.internal.jta.recovery.xarecovery1] Local XARecoveryModule.xaRecovery  got XA exception javax.transaction.xa.XAException, XAException.XAER_RMERR
2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) java.lang.RuntimeException: failed to evaluate: <unbound>=Class.new();
2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) Continuing ...
2016-09-07 23:58:26,304 WARN  [com.arjuna.ats.jta.logging.loggerI18N] (Thread-12) [com.arjuna.ats.internal.jta.recovery.xarecovery1] Local XARecoveryModule.xaRecovery  got XA exception javax.transaction.xa.XAException, XAException.XAER_RMERR

Above should produce:

2:2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) java.lang.InstantiationException: java.sql.Timestamp
4:2016-09-07 23:56:16,273 WARN  [com.arjuna.ats.jta.logging.loggerI18N] (Thread-12) [com.arjuna.ats.internal.jta.recovery.xarecovery1] Local XARecoveryModule.xaRecovery  got XA exception javax.transaction.xa.XAException, XAException.XAER_RMERR
5:2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) java.lang.RuntimeException: failed to evaluate: <unbound>=Class.new();

Upvotes: 1

Views: 118

Answers (2)

James Brown
James Brown

Reputation: 37394

In Gnu awk:

$ awk '/Exception/ && !seen[gensub(/^([^ ]* ){2}/,"","g")]++ {print NR,$0}' file.log
2 2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) java.lang.InstantiationException: java.sql.Timestamp
4 2016-09-07 23:56:16,273 WARN  [com.arjuna.ats.jta.logging.loggerI18N] (Thread-12) [com.arjuna.ats.internal.jta.recovery.xarecovery1] Local XARecoveryModule.xaRecovery  got XA exception javax.transaction.xa.XAException, XAException.XAER_RMERR
5 2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) java.lang.RuntimeException: failed to evaluate: <unbound>=Class.new();

Print record if:

  • /Exception/ matches
  • && and
  • !seen[...]++ key hasn't been seen earlier
  • gensub(/^([^ ]* ){2}/,"","g") key created by removing from start ^ to second space
  • print NR,$0 print current record number and record

Upvotes: 1

Sundeep
Sundeep

Reputation: 23667

$ cat ip.txt 
2016-09-07 23:58:55,674 ERROR [STDERR] (pool-18-thread-1) Continuing ...
2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) java.lang.InstantiationException: java.sql.Timestamp
2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) Continuing ...
2016-09-07 23:56:16,273 WARN  [com.arjuna.ats.jta.logging.loggerI18N] (Thread-12) [com.arjuna.ats.internal.jta.recovery.xarecovery1] Local XARecoveryModule.xaRecovery  got XA exception javax.transaction.xa.XAException, XAException.XAER_RMERR
2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) java.lang.RuntimeException: failed to evaluate: <unbound>=Class.new();
2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) Continuing ...
2016-09-07 23:58:26,304 WARN  [com.arjuna.ats.jta.logging.loggerI18N] (Thread-12) [com.arjuna.ats.internal.jta.recovery.xarecovery1] Local XARecoveryModule.xaRecovery  got XA exception javax.transaction.xa.XAException, XAException.XAER_RMERR

$ perl -ne '($e)=/(\w+Exception)/; print "$.:$_" if !$seen{$e}++ && /Exception/' ip.txt
2:2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) java.lang.InstantiationException: java.sql.Timestamp
4:2016-09-07 23:56:16,273 WARN  [com.arjuna.ats.jta.logging.loggerI18N] (Thread-12) [com.arjuna.ats.internal.jta.recovery.xarecovery1] Local XARecoveryModule.xaRecovery  got XA exception javax.transaction.xa.XAException, XAException.XAER_RMERR
5:2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) java.lang.RuntimeException: failed to evaluate: <unbound>=Class.new();
  • ($e)=/(\w+Exception)/ saves the type of exception in $e variable
  • !$seen{$e}++ makes sure only first line matching the exception is printed
  • && /Exception/ to print only lines containing Exception
  • print "$.:$_" print line number, : and the input line


Edit:

This should work too and faster...

perl -ne 'if(/(\w+Exception)/){print "$.:$_" if !$seen{$1}++}' ip.txt

Upvotes: 1

Related Questions