codingknob
codingknob

Reputation: 11680

pass pandas dataframe as parameter in mysql query

I have a pandas dataframe df that looks something like:

df = pd.DataFrame({'SEC1':['IBM','CSCO','MSFT','AMZN' ], 'SEC2':['GOOG', 'INTC', 'ABX', 'CREE'], 'HOUR':[10 ,10 ,15, 12], 'Size':[100 ,200 ,50 ,500],'Price':[300 ,25 ,150, 80] })

df = df[['SEC1', 'SEC2', 'HOUR', 'Size', 'Price']]

I have a large mysql table (name=Table-B) which I want to do a left join to with my dataframe. I've been reading the pandas documentation but its not clear to me how I can do this without first reading (via read_sql/read_table) Table-B into a dataframe. I don't want to load a massive table into pandas just to then left join to df which will return a small fraction of the rows in Table-B. Is there a way to pass the dataframe df as an object/parameter in read_sql.

If I did this in mysql the code would look something like this where Table-A is my dataframe above:

select * from  Table-A a
left join Table-B b
on (a.sec1 = b.sec1)
where DATE_SUB(CURDATE(),INTERVAL 12 MONTH) <= dt
group by 1,2,3,4,5,6,7

Upvotes: 0

Views: 2102

Answers (1)

unutbu
unutbu

Reputation: 880449

You could load df into a database table, using df.to_sql:

df.to_sql(tablename, engine, if_exists='replace')

and then perform the join in the database, and read the result into a new DataFrame using read_sql:

sql = '''select * from  Table-A a
         left join Table-B b
         on (a.sec1 = b.sec1)
         where DATE_SUB(CURDATE(),INTERVAL 12 MONTH) <= dt
         group by 1,2,3,4,5,6,7'''
result = pd.read_sql(sql, engine)

For example, using SQLAlchemy:

import pandas as pd
import sqlalchemy as SA
import config

df = pd.DataFrame({'SEC1':['IBM','CSCO','MSFT','AMZN' ], 'SEC2':['GOOG', 'INTC', 'ABX', 'CREE'], 'HOUR':[10 ,10 ,15, 12], 'Size':[100 ,200 ,50 ,500],'Price':[300 ,25 ,150, 80] })
df = df[['SEC1', 'SEC2', 'HOUR', 'Size', 'Price']]

engine = SA.create_engine('mysql+oursql://{u}:{p}@{h}/{d}'.format(
    u=config.USER, p=config.PASS, h=config.HOST, d='test'))
tablename = 'Table-A'
df.to_sql(tablename, engine, if_exists='replace')

sql = '''select * from  Table-A a
         left join Table-B b
         on (a.sec1 = b.sec1)
         where DATE_SUB(CURDATE(),INTERVAL 12 MONTH) <= dt
         group by 1,2,3,4,5,6,7'''

result = pd.read_sql(sql, engine)

Upvotes: 1

Related Questions