Reputation: 1129
My job requires that I look up information on a long spreadsheet that's updated and sent to me once or twice a week. Sometimes the newest spreadsheet leaves off information that was in the last spreadsheet causing me to have to look through several different spreadsheets to find the info I need. I recently discovered that I could convert the spreadsheet to a CSV file and then upload it to a database table. With a few lines of script all I have to do is type in what I'm looking for and Voila! Now I just got the newest spreadsheet and I'm wondering if I can just Import it on top of the old one. There is a unique number for each row that I have set to primary in the database. If I try to import it on top of the current info will it just skip the rows where the primary would be duplicated or would it just mess up my database?
Thought I'd ask the experts before I tried it. Thanks for your input!
Details: the spreadsheet consists of clients of ours. Each row contains the client's name, a unique id number, their address and contact info. I can set the row containing the unique ID to primary, then upload it. My concern is that there is nothing to signify a new row in a csv file (i think). when I upload it it it gives me the option to skip duplicates but will it skip the entire row or just that cell causing my data to be placed in the wrong rows.. It's apache server IDK what versions of mysql. I'm using 000webhost for this.
Upvotes: 0
Views: 1569
Reputation: 1129
there's a little checkbox when you click on import near the bottom that says 'ignore duplicates' or something like that. simpler than i thought.
Upvotes: 1
Reputation: 4872
Higgs,
This issue in database/ETL terminology is called deduplication strategy. There is not a template answer for this, but I suggest these helpful readings:
Academic paper - Joint Deduplication of Multiple Record Types in Relational Data
Some open source tools:
Duke tool
Upvotes: 2