NewBee
NewBee

Reputation: 394

How to avoid duplicate file upload but keep the uploader unaware of it?

First of all, I apologize if the question is not clear, I'm explaining it below.

For every file uploaded, I'm renaming the file and recording the hash values (using sha1_files function, please suggest if there are some better or faster hashing techniques for the file in php) in a separate DB table and checking the hash of every new file to avoid duplicate files.

In this manner, the one uploading a duplicate file will get an error msg and the file won't be uploaded.

My question is, is there any techniques or algorithm by which I can prevent duplicate file upload but the duplicate file uploader will be unaware of it and will find the file in his/her account with a different name than the one already present. However, users won't be able to upload banned files by any means.

Upvotes: 0

Views: 2801

Answers (4)

HTMHell
HTMHell

Reputation: 6006

Yes, you should use xxhash which is much faster than sha1.

According to their benchmarks:

The benchmark uses SMHasher speed test, compiled with Visual 2010 on a Windows Seven 32-bits box. The reference system uses a Core 2 Duo @3GHz

SHA1-32 is 0.28 GB/s fast, and xxHash is 5.4 GB/s.

The PHP library is only getting a string as input, so you should use the binary library, and have something like this in your PHP:

list($hash) = explode(" ", shell_exec("/path/to/xxHash/xxhsum " . escapeshellarg($filePath)));
echo $hash;

Installing xxhash:

$ wget https://codeload.github.com/Cyan4973/xxHash/tar.gz/v0.6.3 -O xx.tar.gz
$ tar xvzf xx.tar.gz
$ cd xxHash-0.6.3; make

Upvotes: 2

J. Dietz
J. Dietz

Reputation: 33

You could create an extra table which links files uploaded (so entries in your table with file hashes) with useraccounts. This table can contain an individual file name for every file belonging to a specific user (so the same file can have a different name per user). With current technologies you could also think about creating the file hash in the browser via javascript and then upload the file only if there isn't already a file with that hash in your database if it is you can instead just link this user to the file.

Addition because of comment: If you want the same file to be accessible through multiple urls you can use something like apache's mod_ rewrite. I'm no expert with that but you can look here for a first idea. You could update the .htaccess dynamically with your upload script.

Upvotes: -1

miknik
miknik

Reputation: 5941

Use an example like this to generate your sha1 hash client side before upload.

Save all your uploaded files with their hash as the filename, or have a database table which contains the hash and your local filename for each file, also save file size and content type.

Before upload submit hash from client side to your server and check for hash in database. If its not present then commence file upload. If present then fake the upload client side or whatever you want to do so the user thinks they have uploaded their file.

Create a column in your users table for files uploaded. Store a serialised associative array in this column with hash => users_file_name as key=>value pairs. Unserialize and display to each user to maintain their own file names then use readfile to serve them the file with the correct name, selecting it server side using the hash

As for your URL question. Create a page for the downloads but include the user in the url as well, so mysite.com/image.php?user=NewBee&image=filename.jpg

Query the database for files uploaded by NewBee and unserialize the array. Then:

$upload = $_GET['image'];
foreach($array as $hash => $filename){
        if($filename == $upload)
              $file = $hash;
    }

Seach database for the path to your copy of that file, then using readfile you can output the same file with whatever namme you want.

header("Content-Description: File Transfer");
header("Content-type: {$contenttype}");
header("Content-Disposition: attachment; filename=\"{$filename}\"");
header("Content-Length: " . filesize($file));
header('Pragma: public');
header("Expires: 0");
readfile($file);

Upvotes: 0

Sasha Pachev
Sasha Pachev

Reputation: 5326

Just add some extra logic in your code possibly using an extra table or extra fields in the existing table (it is up to you, there is more than one way to do it) that saves the file to an alternate location should you discover it is a duplicate rather than sending an error. Not sure, though, if what you are doing is a good idea from the UI design point of view, as you are doing something different with the user input in a way that the user will notice without telling the user why.

Upvotes: 0

Related Questions