NorthAfrican
NorthAfrican

Reputation: 135

How to upsert pandas DataFrame to MySQL with SQLAlchemy

I'm pushing data from a data-frame into MySQL, right now it is only adding new data to the table if the data does not exists(appending). This works perfect, however I also want my code to check if the record already exists then it needs to update. So I need it to append + update. I really don't know how to start fixing this as I got stuck....someone tried this before?

This is my code:

engine = create_engine("mysql+pymysql://{user}:{pw}@localhost/{db}"
                        .format(user="root",
                                pw="*****",
                                db="my_db"))
my_df.to_sql('my_table', con = engine, if_exists = 'append')

Upvotes: 3

Views: 3462

Answers (2)

Gord Thompson
Gord Thompson

Reputation: 123829

Similar to the approach used for PostgreSQL here, you can use INSERT … ON DUPLICATE KEY in MySQL:

with engine.begin() as conn:
    # step 0.0 - create test environment
    conn.execute(sa.text("DROP TABLE IF EXISTS main_table"))
    conn.execute(
        sa.text(
            "CREATE TABLE main_table (id int primary key, txt varchar(50))"
        )
    )
    conn.execute(
        sa.text(
            "INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')"
        )
    )
    # step 0.1 - create DataFrame to UPSERT
    df = pd.DataFrame(
        [(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"]
    )

    # step 1 - create temporary table and upload DataFrame
    conn.execute(
        sa.text(
            "CREATE TEMPORARY TABLE temp_table (id int primary key, txt varchar(50))"
        )
    )
    df.to_sql("temp_table", conn, index=False, if_exists="append")

    # step 2 - merge temp_table into main_table
    conn.execute(
        sa.text(
            """\
            INSERT INTO main_table (id, txt) 
            SELECT id, txt FROM temp_table
            ON DUPLICATE KEY UPDATE txt = VALUES(txt)
            """
        )
    )

    # step 3 - confirm results
    result = conn.execute(
        sa.text("SELECT * FROM main_table ORDER BY id")
    ).fetchall()
    print(result)  # [(1, 'row 1 new text'), (2, 'new row 2 text')]

Upvotes: 2

Slava Rozhnev
Slava Rozhnev

Reputation: 10163

You can use next solution on DB side:

First: create table for insert data from Pandas (let call it test):

CREATE TABLE `test` (
  `id` INT(11) NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(100) NOT NULL,
  `capacity` INT(11) NOT NULL,
  PRIMARY KEY (`id`)
);

Second: Create table for resulting data (let call it cumulative_test) exactly same structure as test:

CREATE TABLE `cumulative_test` (
  `id` INT(11) NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(100) NOT NULL,
  `capacity` INT(11) NOT NULL,
  PRIMARY KEY (`id`)
);

Third: set trigger on each insert into the test table will insert ore update record in the second table like:

DELIMITER $$

CREATE
    /*!50017 DEFINER = 'root'@'localhost' */
    TRIGGER `before_test_insert` BEFORE INSERT ON `test` 
    FOR EACH ROW BEGIN
    DECLARE _id INT;
    
    SELECT id INTO _id
    FROM `cumulative_test` WHERE `cumulative_test`.`name` = new.name;
    
    IF _id IS NOT NULL THEN
        UPDATE cumulative_test
        SET `cumulative_test`.`capacity` = `cumulative_test`.`capacity` + new.capacity;
     ELSE 
        INSERT INTO `cumulative_test` (`name`, `capacity`) 
        VALUES (NEW.name, NEW.capacity);
    END IF; 
END;
$$

DELIMITER ;

So you will already insert values into the test table and get calculated results in the second table. The logic inside the trigger can be matched for your needs.

Upvotes: 2

Related Questions