Reputation: 555
I'm doing a simple grep for lines starting with some patteren like:
grep -E "^AAA" myfile > newfile
I would like to also (in the same go) redirect those non-matching lines to another file.
I know it would be possible to simply do it twice and use -v in the second try, but the files are (relatively) huge and only reading them once would save some quite valuable time...
I was thinking something along the line of redirecting non-matching to stderr like:
grep -E -magic_switch "^AAA" myfile > newfile 2> newfile.nonmatch
Is this trick somehow possible with grep or should I rather just code it?
(might be of additional value - I'm coding this in bash script)
Upvotes: 5
Views: 1013
Reputation: 303
You can use process substitution to duplicate the pipe as the file is being read (inspiration https://unix.stackexchange.com/a/71511). This should be almost as performant, since the file is still only being read once.
Something like this should work:
cat file.txt | tee >(grep 'pattern' > matches.txt) | grep -v 'pattern' > non-matches.txt
Upvotes: 2
Reputation: 1
Here is a function for you:
function perg {
awk '{y = $0~z ? "out" : "err"; print > "/dev/std" y}' z="$1" "$2"
}
Use it with a file
perg ^AAA myfile > newfile 2> newfile.nonmatch
or from a pipe
cat myfile | perg ^AAA > newfile 2> newfile.nonmatch
Upvotes: 0
Reputation: 360535
This will work:
awk '/pattern/ {print; next} {print > "/dev/stderr"}' inputfile
or
awk -v matchfile=/path/to/file1 -v nomatchfile=/path/to/file2 '/pattern/ {print > matchfile; next} {print > nomatchfile}' inputfile
or
#!/usr/bin/awk -f
BEGIN {
pattern = ARGV[1]
matchfile = ARGV[2]
nomatchfile = ARGV[3]
for (i=1; i<=3; i++) delete ARGV[i]
}
$0 ~ pattern {
print > matchfile
next
}
{
print > nomatchfile
}
Call the last one like this:
./script.awk regex outputfile1 outputfile2 inputfile
Upvotes: 6
Reputation: 140796
I don't believe this can be done with grep
, but it's only a few lines of Perl:
#! /usr/bin/perl
# usage: script regexp match_file nomatch_file < input
my $regexp = shift;
open(MATCH, ">".shift);
open(NOMATCH, ">".shift);
while(<STDIN>) {
if (/$regexp/o) {
print MATCH $_;
} else {
print NOMATCH $_;
}
}
or Python, if you prefer:
#! /usr/bin/python
# usage: script regexp match_file nomatch_file < input
import sys
import re
exp = re.compile(sys.argv[1])
match = open(sys.argv[2], "w")
nomatch = open(sys.argv[3], "w")
for line in sys.stdin:
if exp.match(line): match.write(line)
else: nomatch.write(line)
(Both totally untested. Your mileage may vary. Void where prohibited.)
Upvotes: 2
Reputation: 272377
I fear this may not be possible. I'd use Perl and do something like:
if (/^AAA/) {
print STDOUT $_;
}
else
{
print STDERR $_;
}
Upvotes: 2