Reputation: 361
I have a dataframe in the following format:
PID | Date1 | Date2 | Details |
---|---|---|---|
17750A | 03/07/1960 | 06/07/2009 | A1B3 |
17758X | 03/07/1960 | 06/07/2009 | A1B3 |
06/09/1961 | 11/05/2013 | A2B2 | |
28363D | 20/11/1964 | 05/03/2019 | A1A2 |
30050A | 30/06/1961 | 18/07/2017 | A1B3 |
04/11/1961 | 16/10/2008 | A2B2 |
And an Oracle database with a table as follows:
ID | DateA | DateB | Notes |
---|---|---|---|
17750A | 03/07/1960 | 06/07/2009 | A1B3 |
03/07/1960 | 06/07/2009 | A1B3 | |
20964Q | 06/09/1961 | 11/05/2013 | A2B2 |
28363D | 20/11/1964 | 04/03/2019 | A1A2 |
30/06/1961 | 19/07/2017 | A1B3 | |
10832Q | 04/11/1961 | 17/10/2008 | A2B2 |
I need to query the database to return another df containing any record where the ID matches a PID, or where (Date1, Date2) equals (DateA, DateB) - i.e. both dates in a df row match both dates in a table row.
So far, I've managed to achieve the first, but not the second.
pid_list = df['PID'].values
nvars = ','.join(f':{i}' for i in range(len(pid_list)))
sql_query = """
SELECT
gd.ID,
gd.DateA,
gd.DateB,
gd.Notes
FROM table1 as gd
WHERE gd.ID in (%s)
""" % nvars
df_result = pd_read_sql(sql_query, connection, params=pid_list)
How can I expand that to also match on the pair of dates? Is there a way to do this by passing a list of tuples as a param, rather than needing to iterate through pairs of dates? Something like:
sql_query = """
SELECT
gd.ID,
gd.DateA,
gd.DateB,
gd.Notes
FROM table1 as gd
WHERE gd.pid in (%s)
OR (Date1 = DateA AND Date2 = DateB)
""" % nvars, dates
df_result = pd.read_sql(sql_query, connection, params=(pid_list, (Date1, Date2)))
I think the params passed to pd.read_sql() may need to be a dict, but am not sure how to structure this or how to reference the different entries in the SQL query without iterating.
Upvotes: 1
Views: 966
Reputation: 21115
You will be always limited with the 1000 entry limit (which you may workaround with the split in sublists using OR
).
For the date compare you may use multi-element IN
list compare as illustrated in the following query
select * from tab
where id in ('17750A','20964Q') or
(dateA,dateB) in ( ( date'1960-07-03', date'2009-07-06'),
(date'1961-06-30' , date'2017-07-19'))
Upvotes: 1
Reputation: 665
Try selected all data for Query:
SELECT
gd.ID,
gd.DateA,
gd.DateB,
gd.Notes
FROM table1 as gd
and filter from your DataFrame. Don't execute query with tuple and IN conditions only 1000 values.
or two query and join two result
or create table in oracle and filter query oracle
Upvotes: 1