arnaslu
arnaslu

Reputation: 1054

Indexing files to database

I need to index a lot of files and folders to database. There will be approx 1000 files/folders per workstation and about 100 workstations.

I will be constantly syncing these files to database, so I need to be able to do a quick query to database to see if file is already in the database. I'm thinking about hashing full path of the file with MD5 and indexing this hash field in the database. Is this the right approach? Can hash collision occur given 1-10 million records?

I have a choice of MySQL or MongoDB, I'm leaning towards MongoDB, would you agree?

Upvotes: 2

Views: 329

Answers (1)

anon
anon

Reputation:

The standard B+tree indexes that MySQL uses will be fine for your purposes, just make sure you're using InnoDB as opposed to MyISAM to avoid global write locks.

If you are worried about collisions, investigate hashing mechanisms that are known to not suffer from collisions - try a MurmurHash or SHA variant instead.

Upvotes: 3

Related Questions