awk 'uniq' on a range of columns

Question

I'm trying to filter out all duplicates of a list, ignoring the first n columns, preferable using awk (but open for other implementations)

I've found a solution for a fixed number of columns, but as I don't know how many columns there will be, I need a range. That solution I've found here

For clarity: What I'm trying to achieve is an alias for history which will filter out duplicates, but leaves the history_id intact, preferably without messing with the order. The history is in this form

ID    DATE       HOUR     command
 5612  2019-07-25 11:58:30 ls /var/log/schaubroeck/audit/2019/May/
 5613  2019-07-25 12:00:22 ls /var/log/schaubroeck/         
 5614  2019-07-25 12:11:30 ls /etc/logrotate.d/                       
 5615  2019-07-25 12:11:35 cat /etc/logrotate.d/samba     
 5616  2019-07-25 12:11:49 cat /etc/logrotate.d/named

So this command works for commands up to four arguments long, but I need to replace the fixed columns by a range to account for all cases:

history | awk -F "[ ]" '!keep[$4 $5 $6 $7]++'

I feel @kvantour is getting me on the right path, so I tried:

history | awk '{t=$0;$1=$2=$3=$4="";k=$0;$0=t}_[k]++' | grep cd

But this still yields duplicate lines

 1102  2017-10-27 09:05:07 cd /tmp/
 1109  2017-10-27 09:07:03 cd /tmp/
 1112  2017-10-27 09:07:15 cd nagent-rhel_64/
 1124  2017-11-07 16:38:50 cd /etc/init.d/
 1127  2017-12-29 11:13:26 cd /tmp/
 1144  2018-06-21 13:04:26 cd /etc/init.d/
 1161  2018-06-28 09:53:21 cd /etc/init.d/
 1169  2018-07-09 16:33:52 cd /var/log/
 1179  2018-07-10 15:54:32 cd /etc/init.d/

Chris Maes · Accepted Answer

you can use sort:

history | sort -u -k4

-u for unique
-k4 to sort only on all columns starting the fourth.

Running this on

 1102  2017-10-27 09:05:07 cd /tmp/
 1109  2017-10-27 09:07:03 cd /tmp/
 1112  2017-10-27 09:07:15 cd nagent-rhel_64/
 1124  2017-11-07 16:38:50 cd /etc/init.d/
 1127  2017-12-29 11:13:26 cd /tmp/
 1144  2018-06-21 13:04:26 cd /etc/init.d/
 1161  2018-06-28 09:53:21 cd /etc/init.d/
 1169  2018-07-09 16:33:52 cd /var/log/
 1179  2018-07-10 15:54:32 cd /etc/init.d/

yields:

 1124  2017-11-07 16:38:50 cd /etc/init.d/                                                                                                                                                                         
 1112  2017-10-27 09:07:15 cd nagent-rhel_64/                                                                                                                                                                      
 1102  2017-10-27 09:05:07 cd /tmp/                                                                                                                                                                                
 1169  2018-07-09 16:33:52 cd /var/log/

EDIT if you want to keep the order you might apply a second sort:

history | sort -u -k4 | sort -n

awk 'uniq' on a range of columns

Answers (2)

Related Questions

awk &#39;uniq&#39; on a range of columns

Answers (2)

Related Questions

awk 'uniq' on a range of columns