Mehdi Haghgoo
Mehdi Haghgoo

Reputation: 3484

How to check for the sanity of sql query in PHP

I am writing a DbAdapter in PHP. Trying to avoid sql injection attacks, for conditional selects, I need a way to check for the sanity of the SQL query that I am going to run. Given that prepared statements make the implementation very complicated, is there a quick way to check for the sanity of the sql query (WHERE clauses in particular as is the case here) before executing in the heart of the class? For example, a helper method to return false for malicious or suspicious queries will be fine.

My class code:

require_once './config.php';

class DbAdapter
{

    private $link;


    /**
     * DbAdapter constructor.
     */
    public function __construct()
    {
        $this->link = new mysqli(DBHOST, DBUSER, DBPASS, DBNAME);
        if ($this->link->connect_errno) {
            die($this->link->connect_error);
        }
    }

    /**
     * @param $table
     * @param array $columns
     * @param string $condition
     * @return bool|mysqli_result
     */
    public function select($table, $columns = [], $condition = "")
    {
        $colsString = $this->extractCols($columns);
        $whereString = $this->extractConditions($condition);
        $sql = "SELECT $colsString FROM `$table` " . $whereString;
        return $this->link->query($sql);
    }

    public function __destruct()
    {
       $this->link->close();
    }

    private function extractCols(array $columns)
    {
        if(!$columns) { return '*';}
        else {
            $str = "";
            foreach($columns as $col) {
                $str .= "$col,";
            }
            return trim($str, ',');
        }
    }

    private function extractConditions(string $conditions)
    {
        if(!$conditions) {
            return "";
        }
        else {
            $where = "WHERE ";
            foreach ($conditions as $key => $value){
                $where .= "$key=" . $conditions[$key] . "&";
            }

            return trim($where, "&");
        }
    }
}

Upvotes: 0

Views: 821

Answers (2)

Bill Karwin
Bill Karwin

Reputation: 563021

Allowing arbitrary input to become part of your SQL code is a fundamentally flawed design. There's no way to make that "sane."

Some technologies like Database Firewall attempt to do what you're asking, to detect when queries are compromised by an SQL injection attack. The trouble is, it's very difficult to distinguish between an SQL query that was compromised, versus one that's merely including legitimate dynamic content.

The result is that injection detection methods are not reliable. They fail to detect all cases of injection, and they also misidentify as injection cases that are legitimate.

Another approach is to use whitelisting of SQL queries. That is, enumerate all the legitimate SQL query forms used by a given application, and allow only those queries to run. This requires that you run the app in a kind of "teaching mode" before you deploy, to identify all the legitimate SQL queries. Then turn on the database firewall to block anything that wasn't a known SQL query at the time you did the test run.

This has disadvantages too. It doesn't account for SQL queries that need to be fully dynamic, like pivot table queries or constructive conditions (e.g. your query gains a variable number of terms in the WHERE clause based on app logic).

The best method of preventing SQL injection is still to use code review. Make sure any dynamic values are passed as query parameters using a prepared statement. You claim that this makes the code "very complex" but that's not true.

$sql = "SELECT ...";
$stmt = $pdo->prepare($sql);
$stmt->execute($paramValuesArray);

At least we can say that it's no less complex to write all the code you showed that appends terms to an SQL statement.

Upvotes: 1

elixenide
elixenide

Reputation: 44851

Short Answer

You can use EXPLAIN, as in EXPLAIN SELECT foo FROM table_bar. How to interpret the results programmatically for "sanity," however, is a much more difficult question. You'll need a programmatic definition of "sanity," like "examines more than n rows" or "involves more than t tables."

SQL Injection

You mentioned that your motivation includes wanting to "avoid sql injection attacks." If that's what's worrying you, the most important thing here is to avoid concatenating any user data into a query. SQL injection is possible if you concatenate any user data, and it's very, very hard to detect. Much better simply to prevent it entirely.

This code, frankly, makes my hair stand on end:

$where = "WHERE ";
foreach ($conditions as $key => $value){
    $where .= "$key=" . $conditions[$key] . "&";
}

There's no way to make that safe enough or to sanity-check it enough. You might think, "Yeah, but all of the conditions should contain only digits," or something similarly easy to validate, but you cannot safely rely on that. What happens when you modify your code next year, or next week, or tomorrow, and add a string parameter? Instant vulnerability.

You need to use prepared statements, rather than concatenating variables into your query. Simply escaping your variables is not enough. See How can I prevent SQL injection in PHP?.

Some Notes on Application Design

Note that this is typically something you do before deploying queries to production, not on the fly. If you're building a toll that allows users to build their own queries, some on-the-fly evaluation of the queries may be unavoidable.

But if all you're dealing with is multiple conditions in the WHERE clause, then queries will be fast (and you won't need to use EXPLAIN) as long as two things are true:

  1. you don't use subqueries, like ... WHERE id IN (SELECT id from OtherTable WHERE ...) ..., and
  2. you have appropriate indexes. (Again, though, this is something you can anticipate at development time in >99% of cases.)

Relevant "War Story" to Hopefully Ease Some of Your Fears

I once wrote a tool that allowed all kinds of complex queries to be built and run against MySQL on a database with several million rows in each of the major tables. The queries were mostly straightforward WHERE conditions, like WHERE lastOrder > '2018-01-01', along with a few (mostly hard-coded) JOIN and subquery possibilities. I just indexed aggressively and never needed to EXPLAIN anything; it never really hit any bottlenecks of performance.

Upvotes: 2

Related Questions