Reputation: 187
i am doing a lot of sql to pandas and i have run in to the following challenge.
I have a dataframe, that looks like
UserID, AccountNo, AccountName
123, 12345, 'Some name'
...
What i would like to do is for each account number, i would like to add a column called total revenue which is gotten from a mysql database, som i am thinking of something like,
for accountno in df['AccountNo']:
df1 = pd.read_sql(('select sum(VBRK_NETWR) as sum from sapdata2016.orders where VBAK_BSARK="ZEDI" and VBRK_KUNAG = %s;') % accountno, conn)
And i need to expand the the dataframe such that
UserID, AccountNo, AccountName, TotalRevenue
123, 12345, 'Some name', df1
...
The code that i have so far (and is not working casts a getitem error)
sets3 = []
i=0
for accountno in df5['kna1_kunnr']:
df1 = pd.read_sql(('select sum(VBRK_NETWR) as sum from sapdata2016.orders where VBAK_BSARK="ZEDI" and VBRK_KUNAG = %s;') % accountno, conn)
df2 = pd.DataFrame([(df5['userid'][i], df5['kna1_kunnr'][i], accountno, df5['kna1_name1'][i], df1['sum'][0])], columns=['User ID', 'AccountNo', 'tjeck', 'AccountName', 'Revenue'])
sets3.append(df2)
i += 1
df6 = pd.concat(sets3)
This idea/code is not pretty, and i wonder if there is a better/nicer way to do it, any ideas?
Upvotes: 0
Views: 779
Reputation: 107567
Consider exporting pandas data to MySQL as a temp table then run an SQL query that joins your pandas data and an aggregate query for TotalRevenue. Then, read resultset into pandas dataframe. This approach avoids any looping.
from sqlalchemy import create_engine
...
# SQL ALCHEMY CONNECTION (PREFERRED OVER RAW CONNECTION)
engine = create_engine('mysql://user:pwd@localhost/database')
# engine = create_engine("mysql+pymysql://user:pwd@hostname:port/database") # load pymysql
df1.to_sql("mypandastemptable", con=engine, if_exists='replace')
sql = """SELECT t.UserID, t.AccountNo, t.AccountName, agg.TotalRevenue
FROM mypandastemptable t
LEFT JOIN
(SELECT VBRK_KUNAG as AccountNo
SUM(VBRK_NETWR) as TotalRevenue
FROM sapdata2016.orders
WHERE VBAK_BSARK='ZEDI'
GROUP BY VBRK_KUNAG) agg
ON t.AccountNo = agg.AccountNo)
"""
newdf = pd.read_sql(sql, con=engine)
Of course the converse is true as well, merging on two pandas dataframes of existing dataframe and the grouped aggregate query resultset:
sql = """SELECT VBRK_KUNAG as AccountNo
SUM(VBRK_NETWR) as TotalRevenue
FROM sapdata2016.orders
WHERE VBAK_BSARK='ZEDI'
GROUP BY VBRK_KUNAG
"""
df2 = pd.read_sql(sql, con=engine)
newdf = df1.merge(df2, on='AccountNo', how='left')
Upvotes: 1