Reputation: 1491
I have a list where first 6 digit is date in format yyyymmdd. The next 4 digits are part of timestamp. I want to select only those numbers which are maximum timestamp for any day.
20160905092900
20160905212900
20160906092900
20160906213000
20160907093000
20160907213000
20160908093000
20160908213000
20160910093000
20160910213100
20160911093100
20160911213100
20160912093100
Means from the above list the output should give the below list.
20160905212900
20160906213000
20160907213000
20160908213000
20160910213100
20160911213100
20160912093100
Upvotes: 1
Views: 27
Reputation: 203324
$ sort -r file | awk '!seen[substr($0,1,8)]++' | sort
20160905212900
20160906213000
20160907213000
20160908213000
20160910213100
20160911213100
20160912093100
If the file's already sorted you can use tac
instead of sort
.
Upvotes: 1
Reputation: 785058
You can use awk:
awk '{
dt = substr($0, 1, 8)
ts = substr($0, 9, 12)
}
ts > max[dt] {
max[dt] = ts
rec[dt] = $0
}
END {
for (i in rec)
print rec[i]
}' file
20160905212900
20160906213000
20160907213000
20160908213000
20160910213100
20160911213100
20160912093100
We are using associative array max
that uses first 8 characters as key and next 4 characters as value. This array is being used to store max timestamp value for a given date. Another array rec
is used to store full line for a date when we encounter timestamp value greater than stored value in max
array.
Upvotes: 0