Joe's Ideas
Joe's Ideas

Reputation: 550

Sort a CSV column by date with text header and numerical column

I have a CSV file like this:

Order,Item,Date
387-0293,chips,04/26/2016
133-4454,soda,04/25/2016
284-9989,beer,04/27/2016

The amount of orders changes daily.

How can I get the third column to sort by date?

I've tried a bunch of different sort combos, here are some things I tried that I thought would have worked:

cat test.csv | sort -t, -k3 > sorted.csv
cat test.csv | sort -u -r > sorted.csv
cat test.csv | sort -n -k3 > sorted.csv

I'm not super knowledgable with bash, just been searching here and google for answers.

Upvotes: 2

Views: 2689

Answers (2)

Benjamin W.
Benjamin W.

Reputation: 52291

You were quite close:

$ sort -t , -k 3.7n -k 3.1,3.2n -k 3.4,3.5n test.csv -o sorted.csv
Order,Item,Date
133-4454,soda,04/25/2016
387-0293,chips,04/26/2016
284-9989,beer,04/27/2016

The -k option takes up to two parameters where you can specify start and end of a sort field as a one-based string. 3.7n is "from the 7th character of the 3rd field to the end of the line, numerically", 3.1,3.2n is for "1st and 2nd character of the 3rd field, numerically" and so on.

The first sort field starts after the end of the third field in the first line, so it is empty, which gets sorted before the numbers.

Upvotes: 6

jaypal singh
jaypal singh

Reputation: 77135

One way using GNU awk using the built-in mktime and PROCINFO:

awk -F, '
    BEGIN {PROCINFO["sorted_in"] = "@ind_str_asc"}
    NR==1 {print $0; next}
    {   
        split ($3, d, /[/]/); 
        time = mktime (d[3] " " d[1] " " d[2] " 00 00 00"); 
        sorted[time] = ((time in sorted) ? sorted[time] RS $0 : $0)
    } 
    END {for (key in sorted) print sorted[key]}
' file 
Order,Item,Date
133-4454,soda,04/25/2016
387-0293,chips,04/26/2016
284-9989,beer,04/27/2016

Upvotes: 0

Related Questions