Reputation: 636
I'm trying to filter out all duplicates of a list, ignoring the first n columns, preferable using awk (but open for other implementations)
I've found a solution for a fixed number of columns, but as I don't know how many columns there will be, I need a range. That solution I've found here
For clarity:
What I'm trying to achieve is an alias for history
which will filter out duplicates, but leaves the history_id intact, preferably without messing with the order.
The history is in this form
ID DATE HOUR command
5612 2019-07-25 11:58:30 ls /var/log/schaubroeck/audit/2019/May/
5613 2019-07-25 12:00:22 ls /var/log/schaubroeck/
5614 2019-07-25 12:11:30 ls /etc/logrotate.d/
5615 2019-07-25 12:11:35 cat /etc/logrotate.d/samba
5616 2019-07-25 12:11:49 cat /etc/logrotate.d/named
So this command works for commands up to four arguments long, but I need to replace the fixed columns by a range to account for all cases:
history | awk -F "[ ]" '!keep[$4 $5 $6 $7]++'
I feel @kvantour is getting me on the right path, so I tried:
history | awk '{t=$0;$1=$2=$3=$4="";k=$0;$0=t}_[k]++' | grep cd
But this still yields duplicate lines
1102 2017-10-27 09:05:07 cd /tmp/
1109 2017-10-27 09:07:03 cd /tmp/
1112 2017-10-27 09:07:15 cd nagent-rhel_64/
1124 2017-11-07 16:38:50 cd /etc/init.d/
1127 2017-12-29 11:13:26 cd /tmp/
1144 2018-06-21 13:04:26 cd /etc/init.d/
1161 2018-06-28 09:53:21 cd /etc/init.d/
1169 2018-07-09 16:33:52 cd /var/log/
1179 2018-07-10 15:54:32 cd /etc/init.d/
Upvotes: 0
Views: 316
Reputation: 37842
you can use sort:
history | sort -u -k4
-u
for unique-k4
to sort only on all columns starting the fourth.Running this on
1102 2017-10-27 09:05:07 cd /tmp/
1109 2017-10-27 09:07:03 cd /tmp/
1112 2017-10-27 09:07:15 cd nagent-rhel_64/
1124 2017-11-07 16:38:50 cd /etc/init.d/
1127 2017-12-29 11:13:26 cd /tmp/
1144 2018-06-21 13:04:26 cd /etc/init.d/
1161 2018-06-28 09:53:21 cd /etc/init.d/
1169 2018-07-09 16:33:52 cd /var/log/
1179 2018-07-10 15:54:32 cd /etc/init.d/
yields:
1124 2017-11-07 16:38:50 cd /etc/init.d/
1112 2017-10-27 09:07:15 cd nagent-rhel_64/
1102 2017-10-27 09:05:07 cd /tmp/
1169 2018-07-09 16:33:52 cd /var/log/
EDIT if you want to keep the order you might apply a second sort:
history | sort -u -k4 | sort -n
Upvotes: 2
Reputation: 26581
The command you propose will not work as you expect. Imagine you have two lines like:
a b c d 12 13 1
x y z d 1 21 31
Both lines will be considered duplicates as the key, used in the array _
is for both d12131
.
This is probably what you are interested in:
$ history | awk '{t=$0;$1=$2=$3="";k=$0;$0=t}!_[k]++'
Here we store the original record in the variable t
. Remove the first three fields of the record by assigning empty values to it. This will redefine the record $0
and store it in the key k
. Then we reset the record to the value of t
. We do the check with the key k
which now holds all fields except the first 3.
note: setting the field separtor as -F" "
will not set it to a single space, but to any seqence of blanks (spaces and tabs). This is also the default behaviour. If you want a single space, add -F"[ ]"
Upvotes: 2