Reputation: 3
I have a file (maillog) like this:
Feb 22 23:53:39 info postfix[102]: connect from APVLDPDF01[...
Feb 22 23:53:39 info postfix[101]: BA1D7805A1: client=APVLDPDF01[...
Feb 22 23:53:39 info postfix[103]: BA1D7805A1: message-id
Feb 22 23:53:39 info opendkim[139]: BA1D7805A1: DKIM-Signature field added
Feb 22 23:53:39 info postfix[763]: ED6F3805B9: to=<[email protected]>, relay...
Feb 22 23:53:39 info postfix[348]: ED6F3805B9: removed
Feb 22 23:53:39 info postfix[348]: BA1D7805A1: from=<[email protected]>,...
Feb 22 23:53:39 info postfix[102]: disconnect from APVLDPDF01...
Feb 22 23:53:39 info postfix[842]: 59AE0805B4: to=<[email protected]>,status=sent
Feb 22 23:53:39 info postfix[348]: 59AE0805B4: removed
Feb 22 23:53:41 info postfix[918]: BA1D7805A1: to=<[email protected]>, status=sent
Feb 22 23:53:41 info postfix[348]: BA1D7805A1: removed
and a second file (mailids) like this:
6DBDD8039F:
3B15BC803B:
BA1D7805A1:
2BD19803B4:
I want to get an output file that contains something like this:
Feb 22 23:53:41 info postfix[918]: BA1D7805A1: to=<[email protected]>, status=sent
Just the lines that the ID exists in the second file, in this example just the ID = BA1D7805A1: is in the file one. But there's another condition, this line must be "ID to=<" it means that just the lines that contain "to=<" and the ID in file two can be output.
I've found differents solutions, but I have a huge problem about the performance. The maillog file size is 2GB, and its about 10millions lines. And the mailid file have around 32000 lines.
The process takes too much time, and I've never seen finished it. I've tried with awk and grep commands, but I dont find the best way.
Upvotes: 0
Views: 102
Reputation: 45353
better to add -w
option
-w, --word-regexp
Select only those lines containing matches that form whole
words. The test is that the matching substring must either be
at the beginning of the line, or preceded by a non-word
constituent character. Similarly, it must be either at the end
of the line or followed by a non-word constituent character.
Word-constituent characters are letters, digits, and the
underscore.
Here is the common command I use.
grep -Fwf mailids maillog |grep 'to=<'
and if the ID is fixed at column 6, try this one-liner awk command
awk 'NR==FNR{a[$1];next} /to=</&&$6 in a ' mailids maillog
Upvotes: 1
Reputation: 16016
grep -F -f mailids maillog | grep 'to=<'
From the grep
man page:
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by
newlines, any of which is to be matched. (-F is specified by
POSIX.)
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file
contains zero patterns, and therefore matches nothing. (-f is
specified by POSIX.)
Upvotes: 2