axnet
axnet

Reputation: 5790

How to filter output lines from bash command, based on dates in start of the line?

I am getting following lines as an output of some bash pipe

output
20200604_tsv
20200605_tsv
20200606_tsv
20200706_tsv

I have a date variable in YYYYMMDD format in a variable

filter_date="20200605"

I want to apply the date operation on the output lines i.e. pick lines only where line's first part (before '_') is less than equal to filter_date.

i.e. Expected output

20200604_tsv
20200605_tsv

How to achieve this filtering in bash pipe?

I have tried following (lexicographically match the string) but not able to filter and get original names.

BASH_CMD_THAT_OUTPUT_LINES | sort | awk '{name = ($1); print name <= "20200605*"}'

## Answer
1
0
0
0

Upvotes: 2

Views: 2183

Answers (4)

kvantour
kvantour

Reputation: 26481

Awk has the power to convert strings to numbers very easily by stripping what is redundant. Eg. The string 123_foo is converted to 123 if you add 0 to it. So the following operation would do what you request:

command | awk '($0+0 < 20200605)'

This method works excellently if you have a sortable date-format like YYYYMMDD. If you have a different format such as YYYYDDMM, you have to use different techniques by first converting the format. Eg.

command | awk '{d=substr($0,1,4)substr($0,7,2)substr($0,5,2)}(d+0 < 20200605)'

Remark that in the last solution, you have to invert your months and days in the last number: i.e. 20200605 is YYYYMMDD and not YYYYDDMM

Upvotes: 2

James Brown
James Brown

Reputation: 37404

Bash only:

while read line
do 
  [[ $line =~ ^[0-9]{8} ]] && [ ${line::8} -le 20200605 ] && echo $line
done < file  # actually command | while ...

Upvotes: 0

axnet
axnet

Reputation: 5790

I have found a simple way to match lexicographically. following is test data and answer simulation

## 1. Test data
cat > /tmp/tmp_test_data <<EOF
20200605_tsv
20200607_tsv
20200604_tsv
20200718_tsv
20200606_tsv
EOF

## 2. Threshold date
check_date="20200605"

## 3. Sort, Filter and output
cat /tmp/tmp_test_data \
        | sort \
        | awk -v check_d=${check_date} '{
            name = ($1); \
            dt = (substr(name, 0, 8)); \
            if (dt <= check_d) \
            {print name}\
          }'

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133518

Could you please try following, written and tested with shown samples in GNU awk.

awk -v filter_date="20200605" '
BEGIN{
  FS=OFS="_"
  filter=mktime(substr(filter_date,1,4)" "substr(filter_date,5,2)" "substr(filter_date,7,2) " 00 00 00")}
{
  curr_dat=mktime(substr($1,1,4)" "substr($1,5,2)" "substr($1,7,2) " 00 00 00")
}
filter<curr_dat{ exit }
1
' Input_file

Explanation: Adding detailed explanation for above.

awk -v filter_date="20200605" '            ##Starting awk program from here and creating awk variable filter_date which is date set by OP till where we need to get the lines.
BEGIN{                                     ##Starting BEGIN section for this program from here.
  FS=OFS="_"                               ##Setting field separator and output field separator as _ here.
  filter=mktime(substr(filter_date,1,4)" "substr(filter_date,5,2)" "substr(filter_date,7,2) " 00 00 00")}    ##Creating filter variable which is mktime function having sub string function in it to get value inn cpoh time for current line.
{
  curr_dat=mktime(substr($1,1,4)" "substr($1,5,2)" "substr($1,7,2) " 00 00 00")     ##Creating curr_dat variable which has mktime function in it which has sub string of current line to get its epoch time for current line.
}
filter<curr_dat{ exit }                    ##Checking condition if filter date is lesser than current date then exit from program.
1                                          ##1 will print current line which will happen when current date is either lesser than or equal to current date.
' Input_file                               ##Mentioning Input_file name here.

Upvotes: 2

Related Questions