Reputation: 1823
I have a a requirement to grep patterns from a file but need them in order.
$ cat patt.grep
name1
name2
$ grep -f patt.grep myfile.log
name2:some xxxxxxxxxx
name1:some xxxxxxxxxx
I am getting the output as name2 was found first it was printed then name1 is found it is also printed. But my requirement is to get the name1 first as per the order of patt.grep file.
I am expecting the output as
name1:some xxxxxxxxxx
name2:some xxxxxxxxxx
Upvotes: 11
Views: 6831
Reputation: 390
Here's a python script that wraps grep to do it. Features:
--only-matching
optiongrep -Fw
#!/usr/bin/env python3
# grep -f in order of pattern file.
# If a pattern occurs multiple times in the input, all matches are printed thereunder.
import argparse
import sys
import subprocess
from collections import defaultdict
def eprint(*args, **kwargs):
print('kgrep.py', *args, file=sys.stderr, **kwargs)
class FileHelper:
def __init__(self, filepath):
self.file = open(filepath, "rb", buffering=1024*1024)
self.line_nb = 0
# Loop through our file until the specified line number
def readline(self, line_nb):
if self.line_nb == line_nb:
# already got that one
return None
assert line_nb > self.line_nb
line = None
while self.line_nb < line_nb:
line = self.file.readline()
self.line_nb += 1
if line is None:
eprint("line_nb", line_nb , "not found")
exit(1)
# we use the \n later anyway, so do not line.rstrip()
return line
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--file', '-f' , help="", required=True)
parser.add_argument('--only-matching', '-o', action='store_true', help="")
args, unknown_args = parser.parse_known_args()
input_file = None
for arg in unknown_args:
if arg.startswith('-'):
continue
if input_file is not None:
eprint('multiple input files not supported:', input_file, arg)
exit(1)
input_file = arg
if input_file is None:
eprint('missing input file')
exit(1)
grep_args = 'grep -f - -o -n'.split(' ')
grep_args.extend(unknown_args)
proc = subprocess.Popen(grep_args, stdin=subprocess.PIPE, stdout=subprocess.PIPE,
stderr=sys.stderr, bufsize=1024*1024)
# First pass all needles to grep (but remember them)
input_ = sys.stdin.buffer if args.file == '-' else open(args.file, "rb")
needles = []
while True:
line = input_.readline()
if not line:
break
proc.stdin.write(line)
needles.append(line.rstrip())
proc.stdin.flush()
proc.stdin.close() # close stdin to signal end of input
only_m = args.only_matching
helper_file = FileHelper(input_file)
matches_dict = defaultdict(list)
# Read grep's line-number prefixed output and extract the full line
while True:
line = proc.stdout.readline()
if not line:
break
line_nb, grep_match = line.split(b':', 1)
full_line = grep_match if only_m else helper_file.readline(int(line_nb))
if full_line is not None:
matches_dict[grep_match.rstrip()].append(full_line)
for needle in needles:
line = matches_dict.get(needle)
if line is None:
eprint("warning: needle not found:", needle.decode())
continue
# we remember that we already printed a match by setting the first el to None
if line[0] is None:
continue
for m in line:
sys.stdout.buffer.write(m)
line[0] = None
exit(proc.wait())
if __name__ == '__main__':
try:
main()
except (BrokenPipeError, KeyboardInterrupt) as e:
# avoid additional broken pipe error. s. https://stackoverflow.com/a/26738736
sys.stderr.close()
exit(e.errno)
Upvotes: 0
Reputation: 51
This should do it
awk -F":" 'NR==FNR{a[$1]=$0;next}{ if ($1 in a) {print a[$0]} else {print $1, $1} }' myfile.log patt.grep > z
Upvotes: 1
Reputation: 438073
This can't be done in grep
alone.
For a simple and pragmatic, but inefficient solution, see owlman's answer. It invokes grep
once for each pattern in patt.grep
.
If that's not an option, consider the following approach:
grep -f patt.grep myfile.log |
awk -F: 'NR==FNR { l[$1]=$0; next } $1 in l {print l[$1]}' - patt.grep
grep
in a single pass,patt.grep
using awk
:
-
, i.e., through the pipe) into an assoc. array using the 1st :
-based field as the keypatt.grep
and prints the corresponding output line, if any.Constraints:
patt.grep
match the 1st :
-based token in the log file, as implied by the sample output data in the question.awk
solution would have to be made more sophisticated.Upvotes: 0
Reputation: 161
You can pipe patt.grep
to xargs
, which will pass the patterns to grep
one at a time.
By default xargs
appends arguments at the end of the command. But in this case, grep
needs myfile.log
to be the last argument. So use the -I{}
option to tell xargs
to replace {}
with the arguments.
cat patt.grep | xargs -Ihello grep hello myfile.log
Upvotes: 6
Reputation: 123518
A simple workaround would be to sort
the log file before grep
:
grep -f patt.grep <(sort -t: myfile.log)
However, this might not yield results in the desired order if patt.grep
is not sorted.
In order to preserve the order specified in the pattern file, you might use awk
instead:
awk -F: 'NR==FNR{a[$0];next}$1 in a' patt.grep myfile.log
Upvotes: 1
Reputation: 2338
i tried the same situation and easily solved using below command:
I think if your data in the same format as you represent then you can use this.
grep -f patt.grep myfile.log | sort
Upvotes: 1
Reputation: 1953
Use the regexes in patt.grep
one after another in order of appearance by reading line-wise:
while read ptn; do grep $ptn myfile.log; done < patt.grep
Upvotes: 2