Reputation: 5790
I am getting following lines as an output of some bash pipe
output
20200604_tsv
20200605_tsv
20200606_tsv
20200706_tsv
I have a date variable in YYYYMMDD format in a variable
filter_date="20200605"
I want to apply the date operation on the output lines i.e. pick lines only where line's first part (before '_') is less than equal to filter_date.
i.e. Expected output
20200604_tsv
20200605_tsv
How to achieve this filtering in bash pipe?
I have tried following (lexicographically match the string) but not able to filter and get original names.
BASH_CMD_THAT_OUTPUT_LINES | sort | awk '{name = ($1); print name <= "20200605*"}'
## Answer
1
0
0
0
Upvotes: 2
Views: 2183
Reputation: 26481
Awk has the power to convert strings to numbers very easily by stripping what is redundant. Eg. The string 123_foo is converted to 123 if you add 0 to it. So the following operation would do what you request:
command | awk '($0+0 < 20200605)'
This method works excellently if you have a sortable date-format like YYYYMMDD. If you have a different format such as YYYYDDMM, you have to use different techniques by first converting the format. Eg.
command | awk '{d=substr($0,1,4)substr($0,7,2)substr($0,5,2)}(d+0 < 20200605)'
Remark that in the last solution, you have to invert your months and days in the last number: i.e. 20200605 is YYYYMMDD and not YYYYDDMM
Upvotes: 2
Reputation: 37404
Bash only:
while read line
do
[[ $line =~ ^[0-9]{8} ]] && [ ${line::8} -le 20200605 ] && echo $line
done < file # actually command | while ...
Upvotes: 0
Reputation: 5790
I have found a simple way to match lexicographically. following is test data and answer simulation
## 1. Test data
cat > /tmp/tmp_test_data <<EOF
20200605_tsv
20200607_tsv
20200604_tsv
20200718_tsv
20200606_tsv
EOF
## 2. Threshold date
check_date="20200605"
## 3. Sort, Filter and output
cat /tmp/tmp_test_data \
| sort \
| awk -v check_d=${check_date} '{
name = ($1); \
dt = (substr(name, 0, 8)); \
if (dt <= check_d) \
{print name}\
}'
Upvotes: 1
Reputation: 133518
Could you please try following, written and tested with shown samples in GNU awk
.
awk -v filter_date="20200605" '
BEGIN{
FS=OFS="_"
filter=mktime(substr(filter_date,1,4)" "substr(filter_date,5,2)" "substr(filter_date,7,2) " 00 00 00")}
{
curr_dat=mktime(substr($1,1,4)" "substr($1,5,2)" "substr($1,7,2) " 00 00 00")
}
filter<curr_dat{ exit }
1
' Input_file
Explanation: Adding detailed explanation for above.
awk -v filter_date="20200605" ' ##Starting awk program from here and creating awk variable filter_date which is date set by OP till where we need to get the lines.
BEGIN{ ##Starting BEGIN section for this program from here.
FS=OFS="_" ##Setting field separator and output field separator as _ here.
filter=mktime(substr(filter_date,1,4)" "substr(filter_date,5,2)" "substr(filter_date,7,2) " 00 00 00")} ##Creating filter variable which is mktime function having sub string function in it to get value inn cpoh time for current line.
{
curr_dat=mktime(substr($1,1,4)" "substr($1,5,2)" "substr($1,7,2) " 00 00 00") ##Creating curr_dat variable which has mktime function in it which has sub string of current line to get its epoch time for current line.
}
filter<curr_dat{ exit } ##Checking condition if filter date is lesser than current date then exit from program.
1 ##1 will print current line which will happen when current date is either lesser than or equal to current date.
' Input_file ##Mentioning Input_file name here.
Upvotes: 2