Steffen Hvid
Steffen Hvid

Reputation: 187

SQL values to update pandas dataframe

i am doing a lot of sql to pandas and i have run in to the following challenge.

I have a dataframe, that looks like

UserID, AccountNo, AccountName
123,    12345,     'Some name'
...

What i would like to do is for each account number, i would like to add a column called total revenue which is gotten from a mysql database, som i am thinking of something like,

for accountno in df['AccountNo']:
    df1 = pd.read_sql(('select sum(VBRK_NETWR) as sum from sapdata2016.orders where VBAK_BSARK="ZEDI" and VBRK_KUNAG = %s;') % accountno, conn)

And i need to expand the the dataframe such that

UserID, AccountNo, AccountName, TotalRevenue
123,    12345,     'Some name', df1
...

The code that i have so far (and is not working casts a getitem error)

sets3 = []
i=0
for accountno in df5['kna1_kunnr']:
    df1 = pd.read_sql(('select sum(VBRK_NETWR) as sum from sapdata2016.orders where VBAK_BSARK="ZEDI" and VBRK_KUNAG = %s;') % accountno, conn)
    df2 = pd.DataFrame([(df5['userid'][i], df5['kna1_kunnr'][i], accountno, df5['kna1_name1'][i], df1['sum'][0])], columns=['User ID', 'AccountNo', 'tjeck', 'AccountName', 'Revenue'])
    sets3.append(df2)
    i += 1

df6 = pd.concat(sets3)

This idea/code is not pretty, and i wonder if there is a better/nicer way to do it, any ideas?

Upvotes: 0

Views: 779

Answers (1)

Parfait
Parfait

Reputation: 107567

Consider exporting pandas data to MySQL as a temp table then run an SQL query that joins your pandas data and an aggregate query for TotalRevenue. Then, read resultset into pandas dataframe. This approach avoids any looping.

from sqlalchemy import create_engine
...

# SQL ALCHEMY CONNECTION (PREFERRED OVER RAW CONNECTION)
engine = create_engine('mysql://user:pwd@localhost/database')
# engine = create_engine("mysql+pymysql://user:pwd@hostname:port/database") # load pymysql

df1.to_sql("mypandastemptable", con=engine, if_exists='replace')

sql = """SELECT t.UserID, t.AccountNo, t.AccountName, agg.TotalRevenue
         FROM mypandastemptable t
         LEFT JOIN 
            (SELECT VBRK_KUNAG as AccountNo
                    SUM(VBRK_NETWR) as TotalRevenue
             FROM sapdata2016.orders 
             WHERE VBAK_BSARK='ZEDI'
             GROUP BY VBRK_KUNAG) agg
         ON t.AccountNo = agg.AccountNo) 
"""

newdf = pd.read_sql(sql, con=engine)

Of course the converse is true as well, merging on two pandas dataframes of existing dataframe and the grouped aggregate query resultset:

sql = """SELECT VBRK_KUNAG as AccountNo
                SUM(VBRK_NETWR) as TotalRevenue
         FROM sapdata2016.orders 
         WHERE VBAK_BSARK='ZEDI'
         GROUP BY VBRK_KUNAG 
"""

df2 = pd.read_sql(sql, con=engine)

newdf = df1.merge(df2, on='AccountNo', how='left')

Upvotes: 1

Related Questions