user1614885
user1614885

Reputation: 23

Uniquely identify files

I'd like to index files in a local database but I do not understand how I can identified each individual file. For example if I store the file path in the database then the entry will no longer be valid if the file is moved or deleted. I imagine there is some way of uniquely identifying files no matter what happens to them but I have had no success with Google.

This will be for *nix/Linux and ext4 in particular, so please nothing specific to windows or ntfs or anything like that.

Upvotes: 2

Views: 2566

Answers (4)

HKalsi
HKalsi

Reputation: 333

Only fly in the ointment with inodes is that they can be reassigned after a delete (depending on the platform) - you need to record the file creation Timestamp as well as the device id to be 100% sure. Its easier with windows and their user file attributes.

Upvotes: 0

Oleg V. Volkov
Oleg V. Volkov

Reputation: 22421

If you do not consider files with same content same and only want to track moved/renamed files as same, then using its inode number will do. Otherwise you will have to hash the content.

Upvotes: 0

chucksmash
chucksmash

Reputation: 5997

In addition to the excellent suggestion above, you might consider using the inode number property of the files, viewable in a shell with ls -i.

Using index.php on one of my boxes:

ls -i

yields

196237 index.php

I then rename the file using mv index.php index1.php, after which the same ls -i yields:

196237 index1.php

(Note the inode number is the same)

Upvotes: 7

secretformula
secretformula

Reputation: 6432

Try using a hashing scheme such as MD5, SHA-1, or SHA-2 these will allow you to match the files up by content.

Basically when you first create the index, you will hash all the files that you wish to add. This string is pretty good at telling if two files are different or the same. Then when you need to see if one of the files is already in the index, hash it and then compare the generated hash to your table of known hashes.

EDIT: As was said in the comments, it is a good idea to incorporate both data's so that way you can more accurately track changes

Upvotes: 2

Related Questions