Violet
Violet

Reputation: 361

Oracle SQL query with multiple conditions based on pandas dataframe

I have a dataframe in the following format:

PID Date1 Date2 Details
17750A 03/07/1960 06/07/2009 A1B3
17758X 03/07/1960 06/07/2009 A1B3
06/09/1961 11/05/2013 A2B2
28363D 20/11/1964 05/03/2019 A1A2
30050A 30/06/1961 18/07/2017 A1B3
04/11/1961 16/10/2008 A2B2

And an Oracle database with a table as follows:

ID DateA DateB Notes
17750A 03/07/1960 06/07/2009 A1B3
03/07/1960 06/07/2009 A1B3
20964Q 06/09/1961 11/05/2013 A2B2
28363D 20/11/1964 04/03/2019 A1A2
30/06/1961 19/07/2017 A1B3
10832Q 04/11/1961 17/10/2008 A2B2

I need to query the database to return another df containing any record where the ID matches a PID, or where (Date1, Date2) equals (DateA, DateB) - i.e. both dates in a df row match both dates in a table row.

So far, I've managed to achieve the first, but not the second.

pid_list = df['PID'].values
nvars = ','.join(f':{i}' for i in range(len(pid_list)))

sql_query = """
            SELECT
                gd.ID,
                gd.DateA,
                gd.DateB,
                gd.Notes
             FROM table1 as gd
            WHERE gd.ID in (%s)
            """ % nvars
            
df_result = pd_read_sql(sql_query, connection, params=pid_list)

How can I expand that to also match on the pair of dates? Is there a way to do this by passing a list of tuples as a param, rather than needing to iterate through pairs of dates? Something like:

sql_query = """
            SELECT
                gd.ID,
                gd.DateA,
                gd.DateB,
                gd.Notes
              FROM table1 as gd
             WHERE gd.pid in (%s)
                OR (Date1 = DateA AND Date2 = DateB)
            """ % nvars, dates

df_result = pd.read_sql(sql_query, connection, params=(pid_list, (Date1, Date2)))

I think the params passed to pd.read_sql() may need to be a dict, but am not sure how to structure this or how to reference the different entries in the SQL query without iterating.

Upvotes: 1

Views: 966

Answers (2)

Marmite Bomber
Marmite Bomber

Reputation: 21115

You will be always limited with the 1000 entry limit (which you may workaround with the split in sublists using OR).

For the date compare you may use multi-element IN list compare as illustrated in the following query

select * from tab
where id in ('17750A','20964Q') or
(dateA,dateB) in ( ( date'1960-07-03', date'2009-07-06'),
                   (date'1961-06-30' , date'2017-07-19'))

Upvotes: 1

Rustam Pulatov
Rustam Pulatov

Reputation: 665

Try selected all data for Query:

            SELECT
                gd.ID,
                gd.DateA,
                gd.DateB,
                gd.Notes
             FROM table1 as gd

and filter from your DataFrame. Don't execute query with tuple and IN conditions only 1000 values.

or two query and join two result

or create table in oracle and filter query oracle

Upvotes: 1

Related Questions