Dan
Dan

Reputation: 65

Matching 2 databases of names, given first, last, gender and DOB?

I collect a list of Facebook friends from my users including First, Last, Gender and DOB. I am then attempting to compare that database of names (stored as a table in MySQL) to another database comprised of similar information.

What would be the best way to conceptually link these results, with the second database being the much larger set of records (>500k rows)?

Here was what I was proposing:

Are there distributed computing concepts that I am missing that may make this faster than a sequential mySQL approach? What other pitfalls may spring up, noting that it is much more important to not have a false-positive rather than miss a record?

Upvotes: 0

Views: 398

Answers (2)

Buzz Moschetti
Buzz Moschetti

Reputation: 7588

If you want to operate on the entire set of data (as opposed to some interactive thing), this data set size might be small enough to simply slurp into memory and go from there. Use a List to hang on to the data then create a Map> that for each unique last name points (via integer index) to all the places in the list where it exists. You'll also set yourself up to be able to perform more complex matching logic without getting caught up trying to coerce SQL into doing it. Especially since you are spanning two different physical databases...

Upvotes: 1

Manikandan Sigamani
Manikandan Sigamani

Reputation: 1984

Yes, your idea seems like a better algorithm.

Assuming performance is your concern, you can use caching to store the values that are just being searched. You can also start indexing the results in a NoSQL database such that the results will be very faster, so that you will have better read performance. If you have to use MySQL, read about polyglot persistence.

Assuming simplicity is your concern, you can still use indexing in a NoSQL database, so over the time you don't have to do myriad of joins will spoil the experience of the user and the developer.

There could be much more concerns, but it all depends on where would you like to use it, to use in a website, or to such data analytic purpose.

Upvotes: 1

Related Questions