Reputation: 23
I'd like to index files in a local database but I do not understand how I can identified each individual file. For example if I store the file path in the database then the entry will no longer be valid if the file is moved or deleted. I imagine there is some way of uniquely identifying files no matter what happens to them but I have had no success with Google.
This will be for *nix/Linux and ext4 in particular, so please nothing specific to windows or ntfs or anything like that.
Upvotes: 2
Views: 2566
Reputation: 333
Only fly in the ointment with inodes is that they can be reassigned after a delete (depending on the platform) - you need to record the file creation Timestamp as well as the device id to be 100% sure. Its easier with windows and their user file attributes.
Upvotes: 0
Reputation: 22421
If you do not consider files with same content same and only want to track moved/renamed files as same, then using its inode number will do. Otherwise you will have to hash the content.
Upvotes: 0
Reputation: 5997
In addition to the excellent suggestion above, you might consider using the inode number property of the files, viewable in a shell with ls -i
.
Using index.php on one of my boxes:
ls -i
yields
196237 index.php
I then rename the file using mv index.php index1.php
, after which the same ls -i
yields:
196237 index1.php
(Note the inode number is the same)
Upvotes: 7
Reputation: 6432
Try using a hashing scheme such as MD5
, SHA-1
, or SHA-2
these will allow you to match the files up by content.
Basically when you first create the index, you will hash all the files that you wish to add. This string is pretty good at telling if two files are different or the same. Then when you need to see if one of the files is already in the index, hash it and then compare the generated hash to your table of known hashes.
EDIT: As was said in the comments, it is a good idea to incorporate both data's so that way you can more accurately track changes
Upvotes: 2