slayedbylucifer
slayedbylucifer

Reputation: 23522

Split a field and then remove duplicates

Sample file:

# cat test1 
-rw-r--r-- 1 root root   19460 Feb 10 03:56 catalina.2015-02-10.log
-rw-r--r-- 1 root root  206868 May  4 15:05 catalina.2015-05-04.log
-rw-r--r-- 1 root root  922121 Jun 24 09:26 catalina.out
-rw-r--r-- 1 root root       0 Feb 10 02:27 host-manager.2015-02-10.log
-rw-r--r-- 1 root root       0 May  4 04:17 host-manager.2015-05-04.log
-rw-r--r-- 1 root root    2025 Feb 10 03:56 localhost.2015-02-10.log
-rw-r--r-- 1 root root    8323 May  4 15:05 localhost.2015-05-04.log
-rw-r--r-- 1 root root     873 Feb 10 03:56 localhost_access_log.2015-02-10.txt
-rw-r--r-- 1 root root  458600 May  4 23:59 localhost_access_log.2015-05-04.txt
-rw-r--r-- 1 root root       0 Feb 10 02:27 manager.2015-02-10.log
-rw-r--r-- 1 root root       0 May  4 04:17 manager.2015-05-04.log

Expected Output:

catalina
host-manager
localhost
localhost_access_log
manager

Attempt 1 (works):

# awk '{split($9,a,"."); print a[1]}' test1 | awk '!z[$i]++'
catalina
host-manager
localhost
localhost_access_log
manager

Attempt 2 (works):

# awk '{split($9,a,"."); print a[1]}' test1 | uniq
catalina
host-manager
localhost
localhost_access_log
manager

Attempt 3 (Fails):

# awk '{split($9,a,"."); a[1]++} {for (i in a){print a[i]}}' test1
1
2015-02-10
log
1
2015-05-04
log
1
out
.
.
.

Question:

I wanted to split the 9th field and then display only the uniq entries. However, I wanted to do this in a single awk one-liner. Seeking help on my 3rd attempt.

Upvotes: 1

Views: 238

Answers (2)

henfiber
henfiber

Reputation: 1307

Another, more idiomatic awk one-liner:

awk '!a[ $0 = substr($NF,1,index($NF,".")-1) ]++' file

or, expressed more explicitly:

awk '{$0=substr($NF,1,index($NF,".")-1)} !a[$0]++' file
  • We use the well-known !a[$0]++ line de-duplication trick.
  • but first we change $0 to : substr($NF,1,index($NF,".")-1)
    • the whole line becomes the substring of the last field $NF up to the the first dot (.) – with substr() and some help from index()

A benefit of this solution is that you don't need to wait until the whole file has been parsed. The split fields are de-duplicated and printed on-the-fly.

Upvotes: 5

fedorqui
fedorqui

Reputation: 290515

You have to use the END block to print the results:

awk '{split($NF,a,"."); b[a[1]]} END{for (i in b){print i}}' file

Notes:

  • I am using $NF to catch the last field. This way, if you happen to have more or less fields than 9, it will also work (as long as there are no filenames with spaces, because parsing ls is evil).
  • We cannot loop directly through the a[] array, because it is the one containing the splitted data. For this we need to create another array, for example b[]. That's why we say b[a[1]]. Alone, there is no need to b[a[1]]++ unless you want to keep track of how many times any item appears.
  • END block is executed after processing the whole file. Otherwise you were going through the results once per record (that is, once per line) and subsequently duplicates were appearing.

Upvotes: 2

Related Questions