foobarbaz
foobarbaz

Reputation: 508

Pass column values into selection for SQL query using pandas.io.sql

I have multiple sql queries I need to run (via pandas.io.sql / .read_sql) that have a very similar structure so I am attempting to parameterize them.

I am wondering if there is a way to pass in column values using .format (which works for strings).

My query (truncated to simplify this post):

sql= '''
SELECT DISTINCT 
    CAST(report_suite AS STRING) AS report_suite, post_pagename,
    COUNT(DISTINCT(CONCAT(post_visid_high,post_visid_low))) AS unique_visitors 
FROM 
    FOO.db
WHERE 
    date_time BETWEEN '{0}' AND '{1}'
    AND report_suite = '{2}'
GROUP BY 
    report_suite, post_pagename
ORDER BY 
    unique_visitors DESC
'''.format(*parameters)

What I would like to do, is be able to parameterize the COUNT(DISTINCT(CONCAT(post_visid_high, post_visid_low))) as Unique Visitors

like this somehow:

COUNT(DISTINCT({3})) as {'4'}

The problem I can't seem to get around is that in order to do this would require storing the column names as something other than a string to avoid the quotes. Is there any good ways around this?

Upvotes: 1

Views: 2024

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

Consider the following approach:

sql_dynamic_parms = dict(
  func1='CONCAT(post_visid_high,post_visid_low)',
  name1='unique_visitors'
)

sql= '''
SELECT DISTINCT 
    CAST(report_suite AS STRING) AS report_suite, post_pagename,
    COUNT(DISTINCT({func1})) AS {name1} 
FROM 
    FOO.db
WHERE 
    date_time BETWEEN %(date_from)s AND %(date_to)s
    AND report_suite = %(report_suite)s
GROUP BY 
    report_suite, post_pagename
ORDER BY 
    unique_visitors DESC
'''.format(**sql_dynamic_parms)

params = dict(
  date_from=pd.to_datetime('2017-01-01'),
  date_to=pd.to_datetime('2017-12-01'),
  report_suite=111
)

df = pd.read_sql(sql, conn, params=params)

PS you may want to read PEP-249 to see what kind of parameter placeholders are accepted

Upvotes: 2

Related Questions