agm
agm

Reputation: 327

Update multiple rows in MySQL with Pandas dataframe

I have worked on a dataframe (previously extracted from a table with SQLAlchemy), and now I want to retrieve the changes updating that table.

I have done it in this very unefficient way:

engine = sql.create_engine(connect_string)
connection = engine.connect()
metadata = sql.MetaData()

pbp = sql.Table('playbyplay', metadata, autoload=True, autoload_with=engine)

for i in range(1,len(playbyplay_substitutions)):
    query_update = ('update playbyplay set Player_1_Visitor = {0}, Player_2_Visitor = {1} ,Player_3_Visitor = {2} ,Player_4_Visitor = {3} ,Player_5_Visitor = {4} where id_match = {5} and actionNumber = {6}'.format(playbyplay_substitutions.loc[i,'Player_1_Visitor_y'], playbyplay_substitutions.loc[i,'Player_2_Visitor_y'], playbyplay_substitutions.loc[i,'Player_3_Visitor_y'], playbyplay_substitutions.loc[i,'Player_4_Visitor_y'], playbyplay_substitutions.loc[i,'Player_5_Visitor_y'], playbyplay_substitutions.loc[i,'id_match'],playbyplay_substitutions.loc[i,'actionNumber']))
    connection.execute(query_update)

playbyplay_substitutions is my dataframe, playbyplay is my table, and the rest are the fields that I want to update or the keys in my table. I am looking for a more efficient solution than the one that I currently have for SQLAlchemy integrated with MySQL.

Upvotes: 0

Views: 570

Answers (1)

Ilja Everilä
Ilja Everilä

Reputation: 52929

Consider using proper placeholders instead of manually formatting strings:

query_update = sql.text("""
    UPDATE playbyplay
    SET Player_1_Visitor = :Player_1_Visitor_y
      , Player_2_Visitor = :Player_2_Visitor_y
      , Player_3_Visitor = :Player_3_Visitor_y
      , Player_4_Visitor = :Player_4_Visitor_y
      , Player_5_Visitor = :Player_5_Visitor_y
    WHERE id_match = :id_match AND actionNumber = :actionNumber
    """)

# .iloc[1:] mimics the original for-loop that started from 1
args = playbyplay_substitutions[[
    'Player_1_Visitor_y', 'Player_2_Visitor_y', 'Player_3_Visitor_y',
    'Player_4_Visitor_y', 'Player_5_Visitor_y', 'id_match',
    'actionNumber']].iloc[1:].to_dict('record')

connection.execute(query_update, args)

If your driver is sufficiently clever, this allows it to prepare a statement once and reuse it over the data, instead of emitting queries one by one. This also avoids possible accidental SQL injection problems, where your data resembles SQL constructs when formatted as a string manually.

Upvotes: 2

Related Questions