ErAB
ErAB

Reputation: 905

grep unique occurrences

I have a log file (file.log) with multiple occurrences of ids i.e. 82244956 in a file. file.log has been created using the command :

gzip -cd /opt/log.gz | grep "JBOSS1-1" >> ~/file.log

Example :

2012-04-10 09:01:18,196 LOG  (7ysdhsdjfhsdhjkwe:IN) JBOSS1-1 (RP-yedgdh5567) [PayPalWeb] Fetch data with id: 82244956  
2012-04-10 09:02:18,196 LOG  (24343sdjjkidgyuwe:IN) JBOSS1-1 (RP-yedgdh5567) [PayPalWeb] Fetch data with id: 82244956  
2012-04-10 09:03:18,196 LOG  (6744443jfhsdgyuwe:IN) JBOSS1-1 (RP-yedgdh5567) [PayPalWeb] Fetch data with id: 82244957  
2012-04-10 09:04:18,196 LOG  (7ysdhsd5677dgyuwe:IN) JBOSS1-1 (RP-yedgdh5567) [PayPalWeb] Fetch data with id: 82244957  

Likewise we have 10000 rows with different ids (but each id repeating 2-3 times. Example top and bottom 2 rows in this example are repeating with id 82244956 and 82244957 respectively). We need result set based on UNIQUE ids (any row from the matched ids)i.e.:

2012-04-10 09:01:18,196 LOG  (7ysdhsdjfhsdhjkwe:IN) JBOSS1-1 (RP-yedgdh5567) [PayPalWeb] Fetch data with id: 82244956  
2012-04-10 09:03:18,196 LOG  (6744443jfhsdgyuwe:IN) JBOSS1-1 (RP-yedgdh5567) [PayPalWeb] Fetch data with id: 82244957  

I tried to awk program on Linux but not a successful one :

awk ' { arr[$1]=$0 } END { for ( key in arr ) { print arr[key] } } ' file.log >> final-report.log

Or a better way would be to create file.log with distinct ids Only.

Please advise how can I modify it?

Upvotes: 0

Views: 1372

Answers (3)

lily
lily

Reputation: 11

You can get the result by running the following script. To keep the first record, you should do a conditional judgment in the main processing part of the script.

awk -F"\t" '{delete arr;split($0,arr,"id:"); id_num=arr[2];
             if(!(id_num in dic)){line[id_num]=$0;dic[id_num];}}
            END{for(i in line)print line[i] }' file.log  > result.log

Upvotes: 0

Tedee12345
Tedee12345

Reputation: 1210

awk '!_[$NF]++' file.log >> final-report.log

Upvotes: 1

Jonathan Leffler
Jonathan Leffler

Reputation: 753545

$1 is the first field, the date. The id is the last field, $NF in awk parlance. So:

awk '{arr[$NF] = $0} END { for (key in arr) { print arr[key] } }' file.log >> final-report.log

This keeps the last record with the given key. To keep the first record, you'd have to do a conditional assignment in the main processing part of the script.

Upvotes: 3

Related Questions