Reputation: 110
When using the redis-py client to perform a full-text search of a user-provided query_str
(which is not of Redis query
argument format, but the text itself to search for) only on the specific field text
on a Redis Stack server, I am afraid that using string interpolation would expose myself to query injection. So what is the best practice in avoiding query injection in this case?
To be accurate, for the client, this issue should be language agnostic, but I would specifically demonstrate this issue in Python.
from redis import Redis
from redis.commands.search.query import Query
client = Redis.from_url("redis://localhost:6379")
def search_1(query_str: str):
query = Query(f"@text:{query_str}")
return client.ft("idx:test").search(query)
def search_2(query_str: str):
params = {"query_str": query_str}
query = Query("@text:$query_str").dialect(2)
return client.ft("idx:test").search(query, params)
In search_1()
, query_str
is concatenated directly onto the query
argument of the FT.SEARCH
command. Thus, suppose that the user provided query_str = "query_for_text @other_field:query_for_other_field"
, the actual command executed would look like FT.SEARCH "@text:query_for_text @other_field:query_for_other_field"
. Therefore, it deviates from my specification to only perform the search on the field text
, and also searches another field.
I believe that search_2()
, utilizing the PARAMS
argument of FT.SEARCH
is the solution for this. It is stated in the FT.SEARCH
documentation regarding PARAMS
that
Each such reference in the search query to a parameter name is substituted by the corresponding parameter value.
and
You cannot reference parameters in the query string where concrete values are not allowed, such as in field names, for example,
@loc
.
While the documentation is not very explicit on this, I believe that by "substituted", it does not mean string interpolation, but the entirety of the specific parameter is treated as a whole "phrase" to search for, which is also confirmed by my experiments. I would like to confirm whether this is the case.
I have indeed looked at Redis security and Is it possible to run a string injection attack on a redis query?. I believe my question is different from the injection that the Redis protocol prevents, which I understood as preventing injection in the message that the Redis client sends to the server (user putting escape character in the input suppose that the Redis protocol uses escape character, which it does not).
In a Github issue, besides using PARAMS
, there are also solutions which manually escapes the input query_str
. Therefore, what is the best practice to deal with unsanitized user inputs used as a part of the query?
Upvotes: 0
Views: 50
Reputation: 702
f"@text:({query_str})"
with the parenthesis included.query_str = query_str.replace('(', '').replace(')', '')
DIALECT 2
With DIALECT 2
, inside the text expression parentheses, no other sub-query is allowed. The query could only be an intersection or a union of terms.
By stripping any parentheses, you eliminate the possibility that someone will
try to use a command like opener) @injection:... (closer
.
Besides that, you should parse the user input further. It's hard to believe the user input will match what they want to find without you parsing their input. Do you want to add |
between terms to match any document which contains at least one of them and not only all? Should you mark some terms as optional so they won't disqualify a document if they are missing from it?
Upvotes: 0