Reputation: 23522
Sample file:
# cat test1
-rw-r--r-- 1 root root 19460 Feb 10 03:56 catalina.2015-02-10.log
-rw-r--r-- 1 root root 206868 May 4 15:05 catalina.2015-05-04.log
-rw-r--r-- 1 root root 922121 Jun 24 09:26 catalina.out
-rw-r--r-- 1 root root 0 Feb 10 02:27 host-manager.2015-02-10.log
-rw-r--r-- 1 root root 0 May 4 04:17 host-manager.2015-05-04.log
-rw-r--r-- 1 root root 2025 Feb 10 03:56 localhost.2015-02-10.log
-rw-r--r-- 1 root root 8323 May 4 15:05 localhost.2015-05-04.log
-rw-r--r-- 1 root root 873 Feb 10 03:56 localhost_access_log.2015-02-10.txt
-rw-r--r-- 1 root root 458600 May 4 23:59 localhost_access_log.2015-05-04.txt
-rw-r--r-- 1 root root 0 Feb 10 02:27 manager.2015-02-10.log
-rw-r--r-- 1 root root 0 May 4 04:17 manager.2015-05-04.log
Expected Output:
catalina
host-manager
localhost
localhost_access_log
manager
Attempt 1 (works):
# awk '{split($9,a,"."); print a[1]}' test1 | awk '!z[$i]++'
catalina
host-manager
localhost
localhost_access_log
manager
Attempt 2 (works):
# awk '{split($9,a,"."); print a[1]}' test1 | uniq
catalina
host-manager
localhost
localhost_access_log
manager
Attempt 3 (Fails):
# awk '{split($9,a,"."); a[1]++} {for (i in a){print a[i]}}' test1
1
2015-02-10
log
1
2015-05-04
log
1
out
.
.
.
Question:
I wanted to split the 9th field and then display only the uniq entries. However, I wanted to do this in a single awk
one-liner. Seeking help on my 3rd attempt.
Upvotes: 1
Views: 238
Reputation: 1307
Another, more idiomatic awk
one-liner:
awk '!a[ $0 = substr($NF,1,index($NF,".")-1) ]++' file
or, expressed more explicitly:
awk '{$0=substr($NF,1,index($NF,".")-1)} !a[$0]++' file
!a[$0]++
line de-duplication trick.$0
to : substr($NF,1,index($NF,".")-1)
$NF
up to the the first dot (.
) – with substr()
and some help from index()
A benefit of this solution is that you don't need to wait until the whole file has been parsed. The split fields are de-duplicated and printed on-the-fly.
Upvotes: 5
Reputation: 290515
You have to use the END
block to print the results:
awk '{split($NF,a,"."); b[a[1]]} END{for (i in b){print i}}' file
Notes:
$NF
to catch the last field. This way, if you happen to have more or less fields than 9, it will also work (as long as there are no filenames with spaces, because parsing ls is evil).a[]
array, because it is the one containing the splitted data. For this we need to create another array, for example b[]
. That's why we say b[a[1]]
. Alone, there is no need to b[a[1]]++
unless you want to keep track of how many times any item appears.END
block is executed after processing the whole file. Otherwise you were going through the results once per record (that is, once per line) and subsequently duplicates were appearing.Upvotes: 2