VNA
VNA

Reputation: 625

awk to extract the data between Dates

Would like to extract the line items, if the dates between 5th Apr to 10th Apr from second field ($2) . Having many gun zip files into that directory.

Inputs.gz

Des1,DATE,Des1,Des2,Des3
ab,01-APR-15,10,0,4
ab,04-APR-15,25,0,12
ab,05-APR-15,40,0,6
ab,07-APR-15,55,0,6
ab,10-APR-15,70,0,1
ab,11-APR-15,85,0,1

I have tried below command and in-complete

zcat Inputs*.gz | awk 'BEGIN{FS=OFS=","} { if ( (substr($2,1,2) >=5) && (substr($2,1,2) <=10) ) print $0 }'  > Output.txt

Expected Output

ab,05-APR-15,40,0,6
ab,07-APR-15,55,0,6
ab,10-APR-15,70,0,1

Please suggest ...

Upvotes: 0

Views: 1046

Answers (2)

Himanshu Ahire
Himanshu Ahire

Reputation: 717

Another simple solution by using regular expression

awk  -F',' '$2 ~ /([0][5-9]|10)-APR-15/{ print $0  }' txt
  • -F Field separator.
  • $2 second field
  • ~ match regular expression
  • '/([0][5-9]|10)-APR-15/` reguler expression to match 05 to 09 or 10 APR-15

Using internal field separator

awk   'BEGIN{ FS="," } $2 ~ /([0][5-9]|10)-APR-15/{ print $0  }' txt

using explicate date number declarations

awk   'BEGIN{ FS="," } $2 ~ /(05|06|07|08|09|10)-APR-15/{ print $0  }' txt

Upvotes: 1

n0741337
n0741337

Reputation: 2504

Try this:

awk -F",|-" '$2 >= 5 && $2 <= 10'

It adds the date delimiter to the FS using the -F flag. To ensure that it's APR of 2015, you could separately add tests like:

awk -F",|-" '$2 >= 5 && $2 <= 10 && $3=="APR" && $4==15'

While this makes the date easy to parse up front, if you want to print it out again, you'll need to reconstruct it with something like _date = $2 "-" $3 "-" $4. And if you need to manipulate the data in general, you'd want to add back in the BEGIN {OFS=","} part.

The field numbering I used assumes there are no "-" delimiters in the first field.

I get the following output:

ab,05-APR-15,40,0,6
ab,07-APR-15,55,0,6
ab,10-APR-15,70,0,1

If you have a whole mess of dates and you really only care about the one in the 2nd field via comma delimiters, you could use split like:

awk -F"," '{ split($2, darr, "-") } darr[1] >= 5 && darr[1] <= 10 && darr[2]=="APR" && darr[3]==15'

which is like saying:

  • for every line, parse the 2nd field into the darr array using the - delimiter
  • for every line, if the logic darr[1] >= 5 && darr[1] <= 10 && darr[2]=="APR" && darr[3]==15 is true print the whole line.

Upvotes: 2

Related Questions