Reputation: 41
I need to sort a file using shell sort in linux. The sort needs to be based on timestamp values contained within each of file's rows. The timestamps are of irregular format and don’t specify the leading zeros to months, days, etc, so the sorts I am performing are not correct (i.e. their format is “M/D/YYYY H:MI:S AM”; so so “10/12/2012 12:16:18 PM” comes before “7/24/2012 12:16:18 PM”, which comes before “7/24/2012 12:17:18 AM”).
Is it possible to sort based on timestamps?
I am using the following command to sort my file:
sort -t= -k3 file.txt -o file.txt.sorted
(use equal sign as a separator => -t=
; use 3rd column as a sort column => -k3
)
A sample file is as follows:
<r id="abcd" t="10/12/2012 12:16:17 AM"><d><nv n="name" v="868" /><nv n="name0" v="73" /><nv n="name1" v="13815004" /></d></r>
<r id="defg" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="0" /></d></r>
<r id="abcd" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="0" /></d></r>
<r id="zxy" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="59542676" /></d></r>
<r id="fghj" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="38" /><nv n="name0" v="0" /><nv n="name1" v="3004537" /></d></r>
<r id="defg" t="7/24/2012 12:16:18 AM"><d><nv n="name" v="177" /><nv n="name0" v="0" /><nv n="name1" v="5888870" /></d></r>
Upvotes: 3
Views: 6187
Reputation: 241861
sort
is a nice tool but it doesn't have enough bells and whistles to take pseudo-xml apart, convert an attribute to a sensible time value, and then sort on it.
However, such tools do exist. While the best way to do this would probably be with an XSLT transform, if the file is really as consistent as your example command expects, you could extract the time values with cut -d'"' -f4
, and you can convert each one to a more sensible format with date
. For example (needs GNU date
):
paste <(cut -d'"' -f4 file.txt | date -f- +%s) file.txt | sort -n | cut -f2-
which extracts the date-times, one per line; feeds them to date to convert them to seconds-since-epoch; pastes each timestamp on the beginning of each line; sorts the pasted result numerically, now with numeric timestamps at the beginning, and finally removes the timestamp to get the original file back.
Test:
$ cat >file.txt <<'EOF'
<r id="abcd" t="10/12/2012 12:16:17 AM"><d><nv n="name" v="868" /><nv n="name0" v="73" /><nv n="name1" v="13815004" /></d></r>
<r id="defg" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="0" /></d></r>
<r id="abcd" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="0" /></d></r>
<r id="zxy" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="59542676" /></d></r>
<r id="fghj" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="38" /><nv n="name0" v="0" /><nv n="name1" v="3004537" /></d></r>
<r id="defg" t="7/24/2012 12:16:18 AM"><d><nv n="name" v="177" /><nv n="name0" v="0" /><nv n="name1" v="5888870" /></d></r>
EOF
$ paste <(cut -d'"' -f4 file.txt | date -f- +%s) file.txt | sort -n | cut -f2-
<r id="defg" t="7/24/2012 12:16:18 AM"><d><nv n="name" v="177" /><nv n="name0" v="0" /><nv n="name1" v="5888870" /></d></r>
<r id="abcd" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="0" /></d></r>
<r id="defg" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="0" /></d></r>
<r id="fghj" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="38" /><nv n="name0" v="0" /><nv n="name1" v="3004537" /></d></r>
<r id="zxy" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="59542676" /></d></r>
<r id="abcd" t="10/12/2012 12:16:17 AM"><d><nv n="name" v="868" /><nv n="name0" v="73" /><nv n="name1" v="13815004" /></d></r>
Upvotes: 3
Reputation: 9161
The linux date
command does a fine job of parsing dates like this, and it can translate them into more sortable things, like simple unix-time integers.
Example:
cat file | while read line; do
datestring=$(sed -e 's/^.* t="\([^"]*\)".*$/\1/' <<<"$line")
echo "$(date -d "$datestring" +%s) $line"
done | sort -n
then you could pass that through the appropriate cut
invocation if you want that unix timestamp removed again.
Upvotes: 4