Ilia Ross
Ilia Ross

Reputation: 13412

Perl regex as user search input (sanitisation)

I need to make sure that regex, that is passed as user input will not accidentally be terminated and turn into arbitrary Perl code, but at the same time work for basic filtering purposes.

Important! This part of the code is run in user-jailed mode, meaning that potentially, it can only be self-exploited. Apart from this, UI is only exposed to particular user, and potentially run against limited number of files, thus potential DoS risks are very minimal.

In order to reach my goal, I created custom function that would first quotemeta all, and later un-escape needed only for regex to run characters.

Example:

# Allow short range of special chars to be left unescaped
# to let regex work, while at the same time prevent possible
# command injection or premature regex termination
my $mask = $in{'mask'};
sub quotemeta_dangerous
{    
    my ($string) = @_;
    $string = quotemeta($string);
    $string =~ s/\\\\/\\/g;
    $string =~ s/\\\+/+/g;
    $string =~ s/\\\*/*/g;
    $string =~ s/\\\$/\$/g;
    $string =~ s/\\\^/\^/g;
    $string =~ s/\\\(/\(/g;
    $string =~ s/\\\)/\)/g;
    $string =~ s/\\\{/\{/g;
    $string =~ s/\\\}/\}/g;
    $string =~ s/\\\[/\[/g;
    $string =~ s/\\\]/\]/g;
    $string =~ s/\\\?/?/g;
    $string =~ s/\\\././g;
    $string =~ s/\\\-/-/g;
    return $string;
}

my $sanitized_mask = quotemeta_dangerous($mask);
if ($filename =~ /$sanitized_mask/) {
    # matched
}

Questions:

  1. Whether my solution above will help me to achieve my goals safely, considering mentioned, important side notes. What are the potential risks that I don't see here?

  2. As side, but familiar question, when further running substitutions, does the replace part can be injected/exploited as well, and if it is, how to safely run substitutions in contents on matched files?

Example:

$file_contents =~ s/\Q$text_to_find\E/$text_to_replace_with/g;

Is $text_to_replace_with can be avoided here as security risk, when passed from user as it is?

Upvotes: 1

Views: 316

Answers (1)

melpomene
melpomene

Reputation: 85867

  1. I'm not sure what you mean by terminated. As for running arbitrary Perl code, you can't do that from user input (unless the program enables it explicitly with e.g. eval() or use re 'eval'). If you could just inject Perl code from user input, your function wouldn't protect against it: It lets through e.g. (?{system+qq(rm -rf ~)}) in runnable form (runnable, that is, if it were part of the code, not input data).

    What you can do with a user input regex is create a DoS: Make it consume a lot of CPU and hang the program. Your function does not protect against that. For example, try:

    'aaaaaaaaaa' =~ /(((\1?[a-z]*)*)*)*[b-z]/
    

    Or with an even longer chain of a's. (There are probably ways to shorten this code; I was just throwing random bits together to see whether they finished matching quickly.)

    If you want to guard against that, have a look at RE2:

    RE2 was designed and implemented with an explicit goal of being able to handle regular expressions from untrusted users without risk.

    You can use it in your code by doing

    {
        use re::engine::RE2 -strict => 1;
        # now regexes compiled in this scope will use the RE2 engine
        ...
    }
    
  2. That's easy to answer. There's no danger here; $text_to_replace_with is simply treated as a string.

    (If you want to create danger, you need either

    • /e and eval(), or
    • /ee, which is the same thing.

    Technically you don't need /e, but that still leaves a very visible eval() in your code. Again, you can't attack this as a user; you have to code it in.)

Upvotes: 3

Related Questions