Reputation: 29
I've wrote a script to read .pdf files. Everything works ok if filename is something like document.pdf but sometimes I receive files with document (1).pdf and the script fails. Following is the code
Any idea?
$dir = $_POST['dir'];
$fname = basename( $_FILES['filename']['name']);
$full_fname = $dir.$fname;
$command ='/usr/bin/pdftotext -layout '.$full_fname.' -';
$content = exec($command, $output, $returnvar);
$count = count($output);
if ($count == 0) {die("Sorry but cant open the file. Maybe the filename contains () or unwanted chars");}
Upvotes: 2
Views: 1291
Reputation: 15969
Do no use $_FILES['filename']['name']
but $_FILES['filename']['tmp_name']
.
The name
field contains the name the uploader claims the file to have on their local file system. This name can be used (after proper output escaping) to give the user a reference.
The tmp_name
is a file name under which the file is stored in a temporary location by PHP after upload. This filename is randomly system generated and free from user injection. But mind that you have to copy/move the file (best using move_uploaded_file()) to a permanent storage place (best outside your document root) in case you need the file later on.
If you ever pass data to command line use escapeshellarg() For instance:
$fname_escaped = escapeshellarg($_FILE['filename']['tmp_name']);
$command ='/usr/bin/pdftotext -layout '.$fname_escaped.' -';
(Yes, even the tmp name, which is most likely safe has to be escaped, both to prevent potential future issues, as well as simplifying cod review)
When printing to the user always escape data, depending on the context, using htmlentities(), json_encode or similar:
$fname_html = htmlentties($_FILE['filename']['name'], ENT_QUOTES);
echo "Thank you for uploading <i>{$fname_html}</i>.";
Such escaping should also be done on the result of an external pogram like pdftotext
.
When storing the name to a database use the proper escaping routines or parameter binding.
Always, for all data coming from outside your program.
Upvotes: 2
Reputation: 3091
Use this regular expression. suppose your $fname
look like document (1).pdf (one spaces or more then one spaces) would come out as document_(1).pdf.
$fname = preg_replace('/\s+/', '_', $fname);
// output : document_(1).pdf you get this output...
// Or Removing All special characters and spaces from filename
$fname = preg_replace("/[^a-z0-9\_\-\.]/i", '', $fname);
// output : document_1.pdf you get this output...
Upvotes: 1