Reputation: 18529
I have an optional condition in WHERE
clause in a SQL query. If the parameter opt_y
is supplied by user, the condition cond2 <> opt_y
is checked, else it's skipped. The other condition (cond1 = x
) remains fixed.
Among the following methods, which is more efficient & secure (avoid SQL injection) . I'm using Django raw query manager to run my SQL (I understand performance depends on the case, but if there's some obvious pitfall, like the db engine can't optimize a particular technique or can't use the cache effectively, please highlight). Ofcourse, if there's a better method, please share it.
Intended query
MyTable.objects.raw("""
SELECT id, RANK() OVER (PARTITION BY name ORDER BY age)
FROM mytable
WHERE cond1 = %s AND cond2 <> %s """, [x, opt_y])
)
Method 1: CASE statement
MyTable.objects.raw("""
SELECT id, RANK() OVER (PARTITION BY name ORDER BY age)
FROM mytable
WHERE cond1 = %s AND
CASE WHEN %s IS NULL THEN true ELSE cond2 <> %s END""", [x, opt_y, opt_y])
)
Method 2: Combining strings
query = """SELECT id, RANK() OVER (PARTITION BY name ORDER BY age)
FROM mytable
WHERE cond1 = %s """
params = [x]
if opt_y:
q += """AND cond2 <> %s"""
params.append(opt_y)
MyTable.objects.raw(query, params)
Method 3: Logical operation
MyTable.objects.raw("""
SELECT id, RANK() OVER (PARTITION BY name ORDER BY age)
FROM mytable
WHERE cond1 = %s AND (%s IS NULL OR cond2 <> %s)""", [x, opt_y, opt_y])
)
Update, adding 1 more method :
Method 4: This is more of a hack
if not opt_y:
opt_y = -1 #Or some value that will NEVER appear in a column
MyTable.objects.raw("""
SELECT id, RANK() OVER (PARTITION BY name ORDER BY age)
FROM mytable
WHERE cond1 = %s AND cond2 <> %s""", [x, opt_y])
)
Using Django 1.6 & PostgreSQL 9.3
Upvotes: 4
Views: 2537
Reputation: 324345
The CASE
or logical operation are fine if you don't mind the verbosity - however, they won't get optimized out if you're using server-side prepared statements so they may result in non-optimal plan choices.
If you're using client-side parameterised queries, where the client driver substitutes the parameters, they'll be fine. You can tell which you're using by looking at the PostgreSQL query logs - if it logs your statements like:
... WHERE $1 = 'fred' AND ...
then you're using server-side parameter binding.
Because of that, sadly, adding the predicate to the SQL string and adding an extra parameter to the query params list is likely to be the most efficient way for many applications. It's one of the more horrible things about SQL IMO.
If you aren't interested in the guts of PostgreSQL, stop reading now.
In case you're curious about how to tell whether a particular construct gets optimized out, you can examine PostgreSQL's low level query parse tree, rewrite, and query plan structures. If you're interested in optimizer effects the query plan structure is what you want.
Say I'm interested in whether, for table:
Table "public.manufacturers"
Column | Type | Modifiers
---------+-----------------------+------------------------------------------------------------
id | integer | not null default nextval('manufacturers_id_seq'::regclass)
name | character varying(30) | not null
country | character varying(40) |
Indexes:
"manufacturers_pkey" PRIMARY KEY, btree (id)
"manufacturers_name_key" UNIQUE CONSTRAINT, btree (name)
a (rather stupid) query like:
select * from manufacturers
where case when null is null then true else id = id end
order by id;
has the CASE optimized out or not. I would:
SET debug_print_plan = on;
SET client_min_messages = debug1;
select * from manufacturers
where case when null is null then true else id = id end
order by id;
and psql
would print:
LOG: statement: select * from manufacturers where case when null is null then true else id = id end order by id;
LOG: plan:
DETAIL: {PLANNEDSTMT
:commandType 1
:queryId 0
:hasReturning false
:hasModifyingCTE false
:canSetTag true
:transientPlan false
:planTree
{INDEXSCAN
:startup_cost 0.15
:total_cost 53.85
:plan_rows 380
:plan_width 180
:targetlist (
{TARGETENTRY
:expr
{VAR
:varno 1
:varattno 1
:vartype 23
:vartypmod -1
:varcollid 0
:varlevelsup 0
:varnoold 1
:varoattno 1
:location 7
}
:resno 1
:resname id
:ressortgroupref 1
:resorigtbl 104875
:resorigcol 1
:resjunk false
}
{TARGETENTRY
:expr
{VAR
:varno 1
:varattno 2
:vartype 1043
:vartypmod 34
:varcollid 100
:varlevelsup 0
:varnoold 1
:varoattno 2
:location 7
}
:resno 2
:resname name
:ressortgroupref 0
:resorigtbl 104875
:resorigcol 2
:resjunk false
}
{TARGETENTRY
:expr
{VAR
:varno 1
:varattno 3
:vartype 1043
:vartypmod 44
:varcollid 100
:varlevelsup 0
:varnoold 1
:varoattno 3
:location 7
}
:resno 3
:resname country
:ressortgroupref 0
:resorigtbl 104875
:resorigcol 3
:resjunk false
}
)
:qual <>
:lefttree <>
:righttree <>
:initPlan <>
:extParam (b)
:allParam (b)
:scanrelid 1
:indexid 104879
:indexqual <>
:indexqualorig <>
:indexorderby <>
:indexorderbyorig <>
:indexorderdir 1
}
:rtable (
{RTE
:alias <>
:eref
{ALIAS
:aliasname manufacturers
:colnames ("id" "name" "country")
}
:rtekind 0
:relid 104875
:relkind r
:lateral false
:inh false
:inFromCl true
:requiredPerms 2
:checkAsUser 0
:selectedCols (b 9 10 11)
:modifiedCols (b)
}
)
:resultRelations <>
:utilityStmt <>
:subplans <>
:rewindPlanIDs (b)
:rowMarks <>
:relationOids (o 104875)
:invalItems <>
:nParamExec 0
}
It takes some practice and understanding of PostgreSQL's innards to read the plan tree. Don't stress if most of it makes no sense at all. The main thing of interest here is that the qual
(qualifiers, or where clause) for the index scan being used to read the table is empty. PostgreSQL hasn't just optimized out the CASE
, it's noticed that id = id
is always true
and optimized out the where clause entirely.
Upvotes: 4