Ventus
Ventus

Reputation: 100

Trying to sort a text file with dates in brackets with "sort"

I'm trying to sort a text by date. My file format is:

...
[15/08/2019 - 01:58:49] some text here
[15/08/2019 - 02:21:23] more text here
[15/08/2019 - 02:56:11] blah blah blah
...

I've tried multiple different methods with the sort command.

One attempt: "sort -b --key=1n --debug Final_out.txt"

sort: using ‘en_US.UTF-8’ sorting rules
sort: key 1 is numeric and spans multiple fields
sort: option '-b' is ignored

^ no match for key
^ no match for key
...
__
.?
^ no match for key
__
.?
^ no match for key
__
sort: write failed: 'standard output': Input/output error
sort: write error

Second attempt: "sort -n -b --key=10,11 --debug Final_out.txt" Produced same output above

Just about to tear my hair out. This has to be possible, it's Linux! Come someone kindly give me pointers?

Upvotes: 2

Views: 338

Answers (3)

Pct Mtnxt
Pct Mtnxt

Reputation: 65

I've the same issue with my HISTORY with HISTTIMEFORMAT="%d/%m/%y %T "

To sort according to year, month and day, I used this options in sort:

  • before
history | awk '/0[78]\/06/{print" "$1"  "$2" "$3" command number "NR}'|head -20
 1921  07/06/22 09:21:05 command number 925
 1922  07/06/22 13:23:31 command number 926
 1923  07/06/22 13:24:16 command number 927
 1924  07/06/22 13:23:31 command number 928
 1925  07/06/22 13:24:16 command number 929
 1926  08/06/22 10:59:12 command number 930
 1927  08/06/22 10:59:21 command number 931
 1928  08/06/22 10:59:26 command number 932
 1929  08/06/22 10:59:27 command number 933
 1930  08/06/22 10:59:34 command number 934
 1931  08/06/22 10:59:44 command number 935
 1932  08/06/22 11:01:47 command number 936
 1933  08/06/22 11:03:35 command number 937
 1934  08/06/22 11:03:44 command number 938
 1935  08/06/22 11:03:48 command number 939
 1936  08/06/22 11:04:02 command number 940
 1937  08/06/22 11:12:17 command number 941
 1938  07/06/22 13:24:16 command number 942
 1939  08/06/22 09:22:10 command number 943
 1940  08/06/22 09:29:41 command number 944
  • after
history | awk '/0[78]\/06/{print" "$1"  "$2" "$3" command number "NR}'|head -20|sort -bn -k2.7,2.8 -k2.4,2.5 -k2.1,2.2 -k3.1,3.2 -k3.4,3.5 -k3.7,3.8 -k1
 1921  07/06/22 09:21:05 command number 925
 1922  07/06/22 13:23:31 command number 926
 1924  07/06/22 13:23:31 command number 928
 1923  07/06/22 13:24:16 command number 927
 1925  07/06/22 13:24:16 command number 929
 1938  07/06/22 13:24:16 command number 942
 1939  08/06/22 09:22:10 command number 943
 1940  08/06/22 09:29:41 command number 944
 1926  08/06/22 10:59:12 command number 930
 1927  08/06/22 10:59:21 command number 931
 1928  08/06/22 10:59:26 command number 932
 1929  08/06/22 10:59:27 command number 933
 1930  08/06/22 10:59:34 command number 934
 1931  08/06/22 10:59:44 command number 935
 1932  08/06/22 11:01:47 command number 936
 1933  08/06/22 11:03:35 command number 937
 1934  08/06/22 11:03:44 command number 938
 1935  08/06/22 11:03:48 command number 939
 1936  08/06/22 11:04:02 command number 940
 1937  08/06/22 11:12:17 command number 941

Explainations in sort -bn -k2.7,2.8 -k2.4,2.5 -k2.1,2.2 -k3.1,3.2 -k3.4,3.5 -k3.7,3.8 -k1 command :

  • d is for remove leading blanks
  • n is for numeric
  • k2.7,2.8 is for 2nd key (the date) from 7th to 8th char (yy)
  • etc for keys 2 and 3 (the time)

And, for @Ventus, the solution can be sort -n -k1.9,1.12 -k1.5,1.6 -k1.2,1.3 -k3.1,3.2 -k3.4,3.5 -k3.7,3.8

Upvotes: 0

anubhava
anubhava

Reputation: 785531

Here is an alternative but shorter sorting way using gnu awk:

cat file
[10/01/2020 - 01:23:45] lorem ipsum
[15/08/2019 - 02:21:23] more text here
[15/08/2019 - 02:56:11] blah blah blah
[15/08/2019 - 01:58:49] some text here
[14/08/2019 - 12:34:56] dolor sit amet

Use this awk:

awk -v FPAT='[0-9:]+' '{ map[$3,$2,$1,$4] = $0 } 
END { PROCINFO["sorted_in"]="@ind_str_asc"; for (k in map) print map[k] }' file
[14/08/2019 - 12:34:56] dolor sit amet
[15/08/2019 - 01:58:49] some text here
[15/08/2019 - 02:21:23] more text here
[15/08/2019 - 02:56:11] blah blah blah
[10/01/2020 - 01:23:45] lorem ipsum

Upvotes: 2

tshiono
tshiono

Reputation: 22032

As Shawnn suggests, how about a bash solution:

#!/bin/bash

pat='^\[([0-9]{2})/([0-9]{2})/([0-9]{4})[[:blank:]]+-[[:blank:]]+([0-9]{2}:[0-9]{2}:[0-9]{2})\]'
while IFS= read -r line; do
    if [[ $line =~ $pat ]]; then
        m=( "${BASH_REMATCH[@]}" )      # make a copy just to shorten the variable name
        echo -e "${m[3]}${m[2]}${m[1]}_${m[4]}\t$line"
    fi
done < file.txt | sort -t $'\t' -k1,1 | cut -f2-
  • The variable pat is a regular expression to match the date and time field and assigns bash variable BASH_REMATCH[@] to day, month, year and time in order.
  • After extracting the date and time field, it generates a new string composed of year, month, day and time in a sortable order and prepend the string to the current line delimited with a tab
  • Then the whole lines are piped to sort keyed on the 1st field.
  • Finally the 1st field is cut off.

The input file file.txt:

[10/01/2020 - 01:23:45] lorem ipsum
[15/08/2019 - 02:21:23] more text here
[15/08/2019 - 02:56:11] blah blah blah
[15/08/2019 - 01:58:49] some text here
[14/08/2019 - 12:34:56] dolor sit amet

Output:

[14/08/2019 - 12:34:56] dolor sit amet
[15/08/2019 - 01:58:49] some text here
[15/08/2019 - 02:21:23] more text here
[15/08/2019 - 02:56:11] blah blah blah
[10/01/2020 - 01:23:45] lorem ipsum

Upvotes: 3

Related Questions