Reputation: 105

awk for date range in seconds

I have a record file that stores the statuses of our systems by date. The script to generate it runs via cron, so the file is constantly getting longer. I wrote a script that iterated over every line to process it and this took a very long time to do. I've heard that awk is much faster at processing large text files. My problem is that I've never used it. Is it possible to use awk to get all entries within a date range? The dates are all in seconds as they were produced with date +%s. Here is an example of output that I would like to be able to quickly find data in a range. So for example, how could I get all lines where the first column is between 1344279903 and 1344280204?

1344279903 |  0  | 0 | node  |  1
1344279904 |  0  | 0 | node  |  2
1344279905 |  0  | 0 | node  |  3
1344280202 |  0  | 0 | node  |  1
1344280203 |  0  | 0 | node  |  2
1344280204 |  99  | 0 | node  |  3

Upvotes: 3

Answers (3)

ghoti

Reputation: 46856

Here's my take on this:

#!/usr/bin/awk -f

BEGIN {
  start=ARGV[1]; ARGV[1]="";
  end=ARGV[2]; ARGV[2]="";
}

$1 < start { next }

$1 > end { exit }

1

How does this work?

Awk uses a series of "condition { command }" blocks that are applied to each line of input. The BEGIN block is a "magic" one that runs before input starts. (There's a similar END block for the end of input, but we're not using it here.)

In this script, our BEGIN block sets "start" and "end" variables based on your command line, then empties those variables so that awk won't try to interpret them as input files.
The next condition causes awk to skip any line that occurs before your start date. When we run next, we tell awk to read a new line of input and start processing its conditions all over again.
The next condition causes awk to exit once it has reached the end of the range of dates you want to print. (This assumes that your input data are in chronological order, of course.)
The last condition is a "1" by itself. This is Awk short-hand for "print the current line", which it will do if neither of the previous conditions were met (since both of the previous conditions would stop us from reaching this point in the script).

Here it is in action, on your sample data:

ghoti@pc$ ./awkdate 1344279905 1344280203 data.txt
1344279905 |  0  | 0 | node  |  3
1344280202 |  0  | 0 | node  |  1
1344280203 |  0  | 0 | node  |  2
ghoti@pc$

Upvotes: 2

kojiro

Reputation: 77127

With awk?

awk -F'|' '1344279903 <= $1 && $1 <= 1344280204' file

With sed?

sed -n '/1344279903/,/1344280204/p' file

You can make the awk expression even more efficient by explicitly exiting after the last print statement:

awk -F'|' '1344279903 <= $1 && $1 <= 1344280204{ print $0; } $1 == 1344280204{ exit; }' file

Upvotes: 3

jspcal

Reputation: 51904

You can use a conditional expression like so:

awk '$1 >= 1344279903 && $1 <= 1344280204 { print $0 }' data.txt

Upvotes: 4

awk for date range in seconds

Answers (3)

Related Questions