Reputation: 21
I was given a Mysql Innodb with 2 tables. One is 117+ million rows and has over 340 columns, including name, address, city, state and zip. The second table is 17+ million rows that has name, address, city, state and zip plus email. The data in the 1st and 2nd tables will not be added to or updated. There is a primary key on an id in each table. There are no other indexes defined.
I first created a contacts table from the 117+ million row table which has just the name, address, city, state, and zip making it significantly smaller. I wrote a php script to perform a search using each row from the smaller table of 17+ million records, trying to find a match in the contacts table. When one is found, I insert the id and email into a separate table. I cancelled it because it was taking approximately 86 seconds per search. With 17+ million records it will take forever to finish.
Here is my search query:
q= "SELECT id FROM GB_contacts WHERE LAST_NAME=\"$LAST\" and FIRST_NAME=\"$FIRST\" and MI=\"$MIDDLE\" and ADDRESS=\"$ADDRESS\" and ZIP=\"$ZIP\"".
My question is how can I do this faster? Should I make an index on name, address, and zip in the contacts table or should I index each column in the contacts table? Is there a faster way of doing this through mysql? I have read a whole bunch of different resources and am unsure which is the best way to go. Since these are such large tables, anything I try to do takes a very long time, so I am hoping to get some expert advice and avoid wasting days, weeks and months trying to figure this out. Thank-you for any helpful advice!
Upvotes: 0
Views: 2591
Reputation: 3037
did you tried typical join, if your join key is indexed it shouldn't take much time.
if its one time you can create indexes on join columns.
second step would be to load the records returned into new contacts table.
Upvotes: 0
Reputation: 708
The best way to do this is to create a clustered index on the fields that you are matching on. In this case, it might be a good idea to start with the zip code, followed by either first or last name first--last names are longer, so take longer to match, but are also more distinct, so it will leave less rows to do further matching (you will have to test which performs better). The strategy here is to tell mysql to look just in pockets of people, rather than search the entire database. While doing this, you got to be clever where to tell MySQL to begin narrowing it down. While you test, don't forget to use the EXPLAIN command.
Upvotes: 1