hikamare
hikamare

Reputation: 334

Check if a record from database exist in a csv file

today I come to you for inspiration or maybe ideas how to solve a task not killing my laptop with massive and repetitive code.

I have a CSV file with around 10k records. I also have a database with respective records in it. I have four fields inside both of these structures: destination, countryCode,prefix and cost

Every time I update a database with this .csv file I have to check if the record with given destination, countryCode and prefix exist and if so, I have to update the cost. That is pretty easy and it works fine.

But here comes the tricky part: there is a possibility that the destination may be deleted from one .csv file to another and I need to be aware of that and delete that unused record from the database. What is the most efficient way of handling that kind of situation?

I really wouldn't want to check every record from the database with every row in a .csv file: that sounds like a very bad idea. I was thinking about some time_stamp or just a bool variable which will tell me if the record was modified during the last update of the DB BUT: there is also a chance that neither of params within the record change, thus: no need to touch that record and mark it as modified.

For that task, I use Python 3 and mysql.connector lib.

Any ideas and advice will be appreciated :)

Upvotes: 0

Views: 652

Answers (2)

Rick James
Rick James

Reputation: 142298

If the .CSV is a replacement for the existing table:

CREATE TABLE new LIKE real;
load the .csv into `new`  (Probably use LOAD DATA...)
RENAME TABLE real TO old, new TO real;
DROP TABLE old;

If you have good reason to keep the old table and patch it, then...

  1. load the .csv into a table
  2. add suitable indexes
  3. do one SQL to do deletes (no loop needed). It is probably a multi-table DELETE.
  4. do one sql to update the prices (no loop needed). It is probably a multi-table UPDATE.

You can probably do the entire task (either way) without touching Python.

Upvotes: 0

Lihi
Lihi

Reputation: 19

If you're keeping a time stamp why do you care if it's updated even if nothing was changed in the record? If the reason is that you want to save the date of the latest update you can add another column saving a time stamp of the last time the record appeared in the csv and afterwords delete all the records that the value of this column in them is smaller than the date of the last csv.

Upvotes: 0

Related Questions