Reputation: 625
Would like to extract the line items, if the date range between 25-mar-2015 to 05-may-2015
from second field ($2)
.
Date column is not sorted and each files contain millions of records.
Inputs.gz
Des,DateInfo,Amt,Loc,Des2
abc,02-dec-2014,10,def,xyz
abc,20-apr-2015,25,def,xyz
abc,14-apr-2015,40,def,xyz
abc,17-mar-2014,55,def,xyz
abc,24-nov-2011,70,def,xyz
abc,13-may-2015,85,def,xyz
abc,30-sep-2008,100,def,xyz
abc,20-jan-2014,115,def,xyz
abc,04-may-2015,130,def,xyz
abc,25-nov-2013,145,def,xyz
abc,29-mar-2015,55,def,xyz
I have tried like below command and in-complete :
function getDate(date) {
split(date, a, "-");
return mktime(a[3] " " sprintf("%02i",(index("janfebmaraprmayjunjulaugsepoctnovdec", a[2])+2)/3) " " a[1] " 00 00 00")
}
BEGIN {FS=","}
{ if ( getDate($2)>=getDate(25-mar-2015) && getDate($2)<=getDate(05-may-2015) ) print $0 }
Expected Output:
abc,20-apr-2015,25,def,xyz
abc,14-apr-2015,40,def,xyz
abc,04-may-2015,130,def,xyz
abc,29-mar-2015,55,def,xyz
Please suggest ... I dont have perl & python access.
Upvotes: 1
Views: 767
Reputation: 203229
$ cat tst.awk
function getDate(date, a) {
split(date, a, /-/)
return mktime(a[3]" "(index("janfebmaraprmayjunjulaugsepoctnovdec",a[2])+2)/3" "a[1]" 0 0 0")
}
BEGIN {
FS=","
beg = getDate("25-mar-2015")
end = getDate("05-may-2015")
}
{ cur = getDate($2) }
NR>1 && cur>=beg && cur<=end
$ awk -f tst.awk file
abc,20-apr-2015,25,def,xyz
abc,14-apr-2015,40,def,xyz
abc,04-may-2015,130,def,xyz
abc,29-mar-2015,55,def,xyz
Upvotes: 5