Reputation: 1723
Are there any examples of how to pass parameters with an SQL query in Pandas?
In particular I'm using an SQLAlchemy engine to connect to a PostgreSQL database. So far I've found that the following works:
df = psql.read_sql(('select "Timestamp","Value" from "MyTable" '
'where "Timestamp" BETWEEN %s AND %s'),
db,params=[datetime(2014,6,24,16,0),datetime(2014,6,24,17,0)],
index_col=['Timestamp'])
The Pandas documentation says that params
can also be passed as a dict, but I can't seem to get this to work having tried for instance:
df = psql.read_sql(('select "Timestamp","Value" from "MyTable" '
'where "Timestamp" BETWEEN :dstart AND :dfinish'),
db,params={"dstart":datetime(2014,6,24,16,0),"dfinish":datetime(2014,6,24,17,0)},
index_col=['Timestamp'])
What is the recommended way of running these types of queries from Pandas?
Upvotes: 94
Views: 216864
Reputation: 23489
I was having trouble passing a large number of parameters when reading from a SQLite Table. Then it turns out since you pass a string to read_sql
, you can just use f-string. Tried the same with MSSQL pyodbc and it works as well.
For SQLite, it would look like this:
# write a sample table into memory
from sqlalchemy import create_engine
df = pd.DataFrame({'Timestamp': pd.date_range('2020-01-17', '2020-04-24', 10), 'Value1': range(10)})
engine = create_engine('sqlite://', echo=False)
df.to_sql('MyTable', engine);
# query the table using a query
tpl = (1, 3, 5, 8, 9)
query = f"""SELECT Timestamp, Value1 FROM MyTable WHERE Value1 IN {tpl}"""
df = pd.read_sql(query, engine)
If the parameters are datetimes, it's a bit more complicated but calling the datetime conversion function of the SQL dialect you're using should do the job.
start, end = '2020-01-01', '2020-04-01'
query = f"""SELECT Timestamp, Value1 FROM MyTable WHERE Timestamp BETWEEN STRFTIME("{start}") AND STRFTIME("{end}")"""
df = pd.read_sql(query, engine)
Upvotes: -1
Reputation: 139342
The read_sql
docs say this params
argument can be a list, tuple or dict (see docs).
To pass the values in the sql query, there are different syntaxes possible: ?
, :1
, :name
, %s
, %(name)s
(see PEP249).
But not all of these possibilities are supported by all database drivers, which syntax is supported depends on the driver you are using (psycopg2
in your case I suppose).
In your second case, when using a dict, you are using 'named arguments', and according to the psycopg2
documentation, they support the %(name)s
style (and so not the :name
I suppose), see http://initd.org/psycopg/docs/usage.html#query-parameters.
So using that style should work:
df = psql.read_sql(('select "Timestamp","Value" from "MyTable" '
'where "Timestamp" BETWEEN %(dstart)s AND %(dfinish)s'),
db,params={"dstart":datetime(2014,6,24,16,0),"dfinish":datetime(2014,6,24,17,0)},
index_col=['Timestamp'])
Upvotes: 133