speed bump
speed bump

Reputation: 451

How to use sort and awk command to sort dates in the 4th column of a file

I have the following file called st.txt:

Item    Type    Amount  Date
Petrol  expense -160    2020-01-23
Electricity expense -200    2020-03-24
Electricity expense -200    2020-04-24
Trim line   expense -50 2020-05-30
Martha Burns    income  150 2021-03-11
Highbury shops  income  300 2021-03-14

I want to sort the data by date and print all data except the first line. The following command works:

awk -F '\t' 'NR>1{print $4"\t"$1"\t"$2"\t"$3}' st.txt | sort -t"-" -n -k1 -k2 -k3

The output then is:

2020-01-23  Petrol  expense -160
2020-03-24  Electricity expense -200
2020-04-24  Electricity expense -200
2020-05-30  Trim line   expense -50
2021-03-11  Martha Burns    income  150
2021-03-14  Highbury shops  income  300

How can I write this command so I do not have to rearrange the columns so the date field remains at $4? I tried the following but it does not work:

awk -F '\t' 'NR>1{print $0}' st.txt | sort -t"-" -n -k 4,1 -k 4,2 -k 4,3

The dates are not sorted with this command.

The output should be:

Petrol expense  -160    2020-01-23
Electricity expense -200    2020-03-24
Electricity expense -200    2020-04-24
Trim line   expense -500    2020-05-30
Martha Burns    income      150 2021-03-11
Highbury shops  income      300 2021-03-14

Upvotes: 1

Views: 1259

Answers (3)

dawg
dawg

Reputation: 104024

Given:

$ awk '{gsub(/\t/,"\\t")} 1' file
Item\tType\tAmount\tDate
Petrol\texpense\t-160\t2020-01-23
Electricity\texpense\t-200\t2020-03-24
Electricity\texpense\t-200\t2020-04-24
Trim line\texpense\t-50\t2020-05-30
Martha Burns\tincome\t150\t2021-03-11
Highbury shops\tincome\t300\t2021-03-14

You can either use Decorate / Sort / Undecorate pattern with POSIX awk:

awk 'BEGIN{FS=OFS="\t"} FNR>1{print $4, $0}' file | sort | cut  -f 2-

Or use a proper CSV parser set to use a \t instead of a comma. Ruby is the easiest:

ruby -r csv -e '
options={:col_sep=>"\t", :headers=>true, :return_headers=>true}
data=CSV.parse($<.read, **options).to_a
header=data.shift.to_csv(**options)
data.sort_by{|r| r[3]}.each{|r| puts r.to_csv(**options)}
' file

Either prints:

Petrol  expense -160    2020-01-23
Electricity expense -200    2020-03-24
Electricity expense -200    2020-04-24
Trim line   expense -50 2020-05-30
Martha Burns    income  150 2021-03-11
Highbury shops  income  300 2021-03-14

Upvotes: 0

Cyrus
Cyrus

Reputation: 88766

With GNU awk:

awk -F '\t' 'NR>1{a[$4]=$0} END{PROCINFO["sorted_in"] = "@ind_str_asc"; for(i in a){print a[i]}}' file

Output:

Petrol  expense -160    2020-01-23
Electricity     expense -200    2020-03-24
Electricity     expense -200    2020-04-24
Trim line       expense -50     2020-05-30
Martha Burns    income  150     2021-03-11
Highbury shops  income  300     2021-03-14

Upvotes: 3

Ed Morton
Ed Morton

Reputation: 203995

Assuming the fields in your input file are tab-separated as your code suggests they are:

$ tail -n +2 file | sort -t$'\t' -k4
Petrol  expense -160    2020-01-23
Electricity     expense -200    2020-03-24
Electricity     expense -200    2020-04-24
Trim line       expense -50     2020-05-30
Martha Burns    income  150     2021-03-11
Highbury shops  income  300     2021-03-14

Upvotes: 3

Related Questions