speed bump
speed bump

Reputation: 451

AWK: Computing the differences between two columns containing dates

I have the following file:

cat x.txt 

2020-01-04
2020-01-01
2008-03-12
2021-08-09

I am trying to write an awk script that outputs the following:

2020-01-04  2022-03-09  795
2020-01-01  2022-03-09  798
2008-03-12  2022-03-09  5110
2021-08-09  2022-03-09  212

Where column 2 is the current date and column 3 is the difference between column 1 and 2 in days. I have started a script like this but not really getting it:

cat y

#!/usr/bin/env bash
  
awk '

BEGIN {
    FS = OFS = "\t"
    str = "date +%Y-%m-%d"
    str | getline date
    d2 = mktime((gensub(/-/, " ", "g", date)) " 0 0 0")
    d1 = mktime((gensub(/-/, " ", "g", $1)) " 0 0 0")
}

{
    print $1, date, d2 - d1
}

' "${@:--}"

When I run this I get the following:

./y x.txt 

2020-01-04  2022-03-09  1646737201
2020-01-01  2022-03-09  1646737201
2008-03-12  2022-03-09  1646737201
2021-08-09  2022-03-09  1646737201

I am not sure how to work with dates so help is much appreciated.

Upvotes: 1

Views: 56

Answers (1)

Ed Morton
Ed Morton

Reputation: 203665

Using GNU awk for time functions:

$ cat tst.awk
BEGIN {
    OFS = "\t"
    today = strftime("%Y-%m-%d")
    today_secs = mktime(gensub(/-/," ","g",today) " 12 0 0")
    secs_in_day = 24 * 60 * 60
}
{
    secs = mktime(gensub(/-/," ","g",$1) " 12 0 0")
    print $1, today, int( (today_secs - secs) / secs_in_day )
}

$ awk -f tst.awk x.txt
2020-01-04      2022-03-08      794
2020-01-01      2022-03-08      797
2008-03-12      2022-03-08      5109
2021-08-09      2022-03-08      211

It's currently 3/8 rather than 3/9 in my timezone hence the numbers of days being 1 less than the expected output in the question.

Upvotes: 3

Related Questions