Reputation: 135
I'm pushing data from a data-frame into MySQL, right now it is only adding new data to the table if the data does not exists(appending). This works perfect, however I also want my code to check if the record already exists then it needs to update. So I need it to append + update. I really don't know how to start fixing this as I got stuck....someone tried this before?
This is my code:
engine = create_engine("mysql+pymysql://{user}:{pw}@localhost/{db}"
.format(user="root",
pw="*****",
db="my_db"))
my_df.to_sql('my_table', con = engine, if_exists = 'append')
Upvotes: 3
Views: 3462
Reputation: 123829
Similar to the approach used for PostgreSQL here, you can use INSERT … ON DUPLICATE KEY
in MySQL:
with engine.begin() as conn:
# step 0.0 - create test environment
conn.execute(sa.text("DROP TABLE IF EXISTS main_table"))
conn.execute(
sa.text(
"CREATE TABLE main_table (id int primary key, txt varchar(50))"
)
)
conn.execute(
sa.text(
"INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')"
)
)
# step 0.1 - create DataFrame to UPSERT
df = pd.DataFrame(
[(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"]
)
# step 1 - create temporary table and upload DataFrame
conn.execute(
sa.text(
"CREATE TEMPORARY TABLE temp_table (id int primary key, txt varchar(50))"
)
)
df.to_sql("temp_table", conn, index=False, if_exists="append")
# step 2 - merge temp_table into main_table
conn.execute(
sa.text(
"""\
INSERT INTO main_table (id, txt)
SELECT id, txt FROM temp_table
ON DUPLICATE KEY UPDATE txt = VALUES(txt)
"""
)
)
# step 3 - confirm results
result = conn.execute(
sa.text("SELECT * FROM main_table ORDER BY id")
).fetchall()
print(result) # [(1, 'row 1 new text'), (2, 'new row 2 text')]
Upvotes: 2
Reputation: 10163
You can use next solution on DB side:
First: create table for insert data from Pandas (let call it test):
CREATE TABLE `test` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(100) NOT NULL,
`capacity` INT(11) NOT NULL,
PRIMARY KEY (`id`)
);
Second: Create table for resulting data (let call it cumulative_test) exactly same structure as test:
CREATE TABLE `cumulative_test` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(100) NOT NULL,
`capacity` INT(11) NOT NULL,
PRIMARY KEY (`id`)
);
Third: set trigger on each insert into the test table will insert ore update record in the second table like:
DELIMITER $$
CREATE
/*!50017 DEFINER = 'root'@'localhost' */
TRIGGER `before_test_insert` BEFORE INSERT ON `test`
FOR EACH ROW BEGIN
DECLARE _id INT;
SELECT id INTO _id
FROM `cumulative_test` WHERE `cumulative_test`.`name` = new.name;
IF _id IS NOT NULL THEN
UPDATE cumulative_test
SET `cumulative_test`.`capacity` = `cumulative_test`.`capacity` + new.capacity;
ELSE
INSERT INTO `cumulative_test` (`name`, `capacity`)
VALUES (NEW.name, NEW.capacity);
END IF;
END;
$$
DELIMITER ;
So you will already insert values into the test table and get calculated results in the second table. The logic inside the trigger can be matched for your needs.
Upvotes: 2