Jake
Jake

Reputation: 21

Using sed to obtain pattern range through multiple files in a directory

I was wondering if it was possible to use the sed command to find a range between 2 patterns (in this case, dates) and output these lines in the range to a new file.

Right now, I am just looking at one file and getting lines within my time range of the file FileMoverTransfer.log. However, after a certain time period, these logs are moved to new log files with a suffix such as FileMoverTransfer.log-20180404-xxxxxx.gz. Here is my current code:

sed -n '/^'$start_date'/,/^'$end_date'/p;/^'$end_date'/q' FileMoverTransfer.log >> /public/FileMoverRoot/logs/intervalFMT.log

While this doesn't work, as sed isn't able to look through all of the files in the directory starting with FileMoverTransfer.log?

sed -n '/^'$start_date'/,/^'$end_date'/p;/^'$end_date'/q' FileMoverTransfer.log* >> /public/FileMoverRoot/logs/intervalFMT.log

Any help would be greatly appreciated. Thanks!

Upvotes: 1

Views: 311

Answers (2)

Dario
Dario

Reputation: 2723

awk solution

As the OP confirmed that an awk solution would be acceptable, I post it.

(gunzip -c FileMoverTransfer.log-*.gz; cat FileMoverTransfer.log ) \
  |awk -v st="$start_date" -v en="$end_date" '$1>=st&&$1<=en{print;next}$1>en{exit}'\
  >/public/FileMoverRoot/logs/intervalFMT.log

This solution is functionally almost identical to Barmar’s sed solution, with the difference that his solution, like the OP’s, will print and quit at the first record matching the end date, while mine will print all lines matching the end date and quit at the first record past the end date, without printing it.

Some remarks:

  • The OP didn't specify the date format. I suppose it is a format compatible with ordinary string order, otherwise some conversion function should be used.

  • The files FileMoverTransfer.log-*.gz must be named in such a way that their alphabetical ordering corresponds to the chronological order (which is probably the case.)

  • I suppose that the dates are separated from the rest of the line by whitespace. If they aren’t, you have to supply the -F option to awk. E.g., if the dates are separated by -, you must write awk -F- ...

  • awk is much faster than sed in this case, because awk simply looks for the separator (whitespace or whatever was supplied with -F) while sed performs a regexp match.

  • There is no concept of range in my code, only date comparison. The only place where I suppose that the lines are ordered is when I say $1>en{exit}, that is exit when a line is newer than the end date. If you remove that final pattern and its action, the code will run through the whole input, but you could drop the requirement that the files be ordered.

Upvotes: 0

Barmar
Barmar

Reputation: 782107

The range operator only operates within a single file, so you can't use it if the start is in one file and the end is in another file.

You can use cat to concatenate all the files, and pipe this to sed:

cat FileMoverTransfer.log* | sed -n "/^$start_date/,/^$end_date/p;/^$end_date/q" >> /public/FileMoverRoot/logs/intervalFMT.log

And instead of quoting and unquoting the sed command, you can use double quotes so that the variables will be expanded inside it. This will also prevent problems if the variables contain whitespace.

Upvotes: 1

Related Questions