ParoX
ParoX

Reputation: 5933

grep vs Perl. Should I mod this to work, or should I just replace it with Perl code?

I am writing something that will allow users to search through a log of theirs. Currently I have this, where $in{'SEARCH'} is the string they are searching.

open(COMMAND, "grep \"$in{'SEARCH'}\" /home/$palace/palace/logs/$logfile | tail -n $NumLines |");
    $f = <COMMAND>;
    if  ($f) {
        print $Title;
        print "<div id=log>\n";
            do {  print $f."<br>";} while ($f = <COMMAND>);
        print "</div>\n";
    } else {print $Error; }
close(COMMAND);

However I noticed that they could easily fool the script and the grep command to error by using a double quote (") or a backslash. Therefore I added this piece of code:

$in{'SEARCH'} =~ s|\\|\\\\|g;
$in{'SEARCH'} =~ s|"|\Q\"\E|g;

open(COMMAND, "grep \"$in{'SEARCH'}\" /home/$palace/palace/logs/$logfile | tail -n $NumLines |");
    $f = <COMMAND>;
    if  ($f) {
        print $Title;
        print "<div id=log>\n";
            do {  print $f."<br>";} while ($f = <COMMAND>);
        print "</div>\n";
    } else {print $Error; }
close(COMMAND);

However, I am still having problems. The grep command does not like the \ in its search giving errors like

grep "\\" /home/test/palace/logs/chat.log
grep: Trailing backslash

Should I even continue to try and use the grep command, and if so, what is a good Perl function that helps on stripping out weird characters that will help the grep command, say as making " to be \" and etc. Or, instead of messing around should I just use Perl code to accomplish the same thing, even though I've read it wouldn't be as fast as grep?


Update: 7-5-2009 5:20 PM EST


Many people contributed code, especially those attempting to be faster than the system grep. As of right now, its still the fastest. Here is how the benchmarks go:

Using system grep:

Top   of  file: 1 wallclock secs ( 0.00 usr 0.01 sys + 0.13 cusr 0.15 csys = 0.29 CPU)
Bottom of file: 1 wallclock secs ( 0.00 usr 0.00 sys + 0.21 cusr 0.18 csys = 0.39 CPU)

Using Hypneks example (push and shift):

Top   of  file: 4 wallclock secs ( 3.78 usr + 0.19 sys = 3.97 CPU)
Bottom of file: 4 wallclock secs ( 3.86 usr + 0.19 sys = 4.05 CPU)

Using my perl example (using the reverse command):

Top   of  file: 6 wallclock secs ( 4.76 usr + 1.45 sys = 6.21 CPU)
Bottom of file: 5 wallclock secs ( 3.93 usr + 1.44 sys = 5.37 CPU)

Using my File::ReadBackwards:

Top   of  file:11 wallclock secs (11.20 usr + 0.11 sys = 11.31 CPU)
Bottom of file: 1 wallclock secs ( 0.59 usr + 0.01 sys = 0.60 CPU)

Using xcramps example (built in grep):

Top   of  file: 9 wallclock secs ( 8.04 usr + 0.47 sys = 8.51 CPU)
Bottom of file: 8 wallclock secs ( 8.16 usr + 0.43 sys = 8.59 CPU)

Upvotes: 4

Views: 2325

Answers (8)

user3458
user3458

Reputation:

Silly idea, but how about using grep with single quote when there is no single quote in the line, or using perl when there is? This way you get fast performance and security at the same time.

Upvotes: 0

xcramps
xcramps

Reputation: 1213

Perl has a built in grep:

open(IN, $file) || die;
@result = grep {/\Q$in{'SEARCH'}\E/} <IN>;
close(IN);

# this is the equivalent of your tail
$end = $#result;
$begin = $end - $NumLines + 1;
$begin = 0 if $begin < 0;
print "@result[$begin..$end]";

Upvotes: 0

jpbaudoin
jpbaudoin

Reputation: 81

In terms of performance, Perl is excelent. It may not be as fast as the grep process, but I think that the difference may not be noticeable by the end user.

Since one of your goals is control over the user input I sugest you go with Perl, beacuse it can give you as much control over the user input as you may need now or in the future.

Upvotes: 1

ysth
ysth

Reputation: 98388

You can bypass shell-quoting issues completely by specifying the arguments separately:

open my $grepfh, "-|", "grep", "-F", $in{'SEARCH'}, "/home/$palace/palace/logs/$logfile" or die "grep open error: $! $?\n";

Including the tail, without any error checking:

open( my $resultfh, '-|' )
or do {
    pipe(my $rdfh, my $wrfh);
    if (fork) {    
        open( STDOUT, '>&', $wrfh );
        exec( "grep", @grepargs );
    }
    else {
        open( STDIN, '<&', $rdfh );
        exec( "tail", @tailargs );
    }
};
while (my $line = readline($resultfh)) {
    ....
}

Upvotes: 2

Hynek -Pichi- Vychodil
Hynek -Pichi- Vychodil

Reputation: 26121

Try this approach:

my $re = qr/\Q$in{'SEARCH'}\E/;
my @lines;
while (<$fh>) {
    next unless m/$re/;
    push @lines, $_;
    shift @lines if @lines > $NumLines;
}

print @lines;

Upvotes: 2

ParoX
ParoX

Reputation: 5933

I did a quick perl writeup of how it would work, im not sure if this is optimized or the best way. But here it is:

my $num_matches = 0;
my $logdir = "/home/$palace/palace/logs/$logfile";
open my $fh, "<", "$logdir" or die "Can't open [$logdir]: $!";  


print $Title;
print "<div id=log>\n";


@lines = reverse <$fh>;
foreach $line (@lines) {
    #print "<b>$line</b> - $num_matches<br>";
    if ($line =~ m/\Q$in{'SEARCH'}\E/){
        print $line."<br>";
        $num_matches ++;
    }
    if ($num_matches >= $NumLines) {
        last;
    }
}

print "</div>\n";

I do want to note that I ran benchmarks (http://perldoc.perl.org/Benchmark.html) on the PERL way (above) vs my grep way (original question) these are the follow results:

PERL WAY: the code took: 4 wallclock secs ( 2.89 usr + 0.96 sys = 3.85 CPU)

GREP WAY: the code took: 1 wallclock secs ( 0.02 usr 0.04 sys + 0.31 cusr 0.14 csys = 0.51 CPU)

So i need to optimize my Perl code. Any ideas? I thought by reading the file in reverse, that it could find the matches ASAP and wouldnt need to read any further, saving it from reading the whole file to memory. However, I am guessing that the reverse command reads the file in memory anyways. Perhaps I should use system(tac $file) and read it, but again I am trying to avoid external command calls.

Upvotes: 1

Jonathan Leffler
Jonathan Leffler

Reputation: 753555

I'll agree with Ry4an that going with Perl is probably sensible.

If you want to fix up your code, you need to (should) use single quotes around the arguments that you don't want the shell messing with - not double quotes. As a for instance, either backticks or '$(...)' notation will execute commands inside double quotes, and '${variable}' and such like will be expanded. Dealing with all the notations is too much like hard work; using single quotes is much simpler. Note that embedded single quotes need to be replaced by the sequence "'\''" (that was fun!); the double quotes simply surround the four characters that you replace a single quote with (assuming the string as a whole is embedded in single quotes). The first quote closes the current quoted string; the backslash, single quote means a literal single quote; the last single quote starts a new single quoted string. You don't need to escape any other characters.


Taking your code:

my $search = $in{SEARCH};
$search =~ s/'/'\\''/g;

open(COMMAND, "grep '$search' /home/$palace/palace/logs/$logfile | tail -n $NumLines |");
$f = <COMMAND>;
if  ($f)
{
    print $Title;
    print "<div id=log>\n";
    do { print $f."<br>";} while ($f = <COMMAND>);
    print "</div>\n";
}
else
{
    print $Error;
}
close(COMMAND);

Upvotes: 6

Ry4an Brase
Ry4an Brase

Reputation: 78330

Definitely just use Perl. Keeping all the work in a single process will speed things up, and invoking a subcommand with user specified args is just begging for exploits. If you do decide to stick with the subcommand quotemeta might be worth looking at.

Upvotes: 4

Related Questions