Reputation: 53
I need a regex to extract the filename (incl. file extension) from the following string:
attachment; filename*=UTF-8''test.rar
or like this
attachment; filename*=UTF-8''Epost%20-test.part01.rar
Target:
test.rar
Epost%20-test.part01.rar
How can I do this?
Note: I'm using preg_match for extracting
Upvotes: 4
Views: 7741
Reputation: 2462
Assuming you have your Content-Disposition header as string in $contentDisposition, the trick is to use parse_ini_string
:
$parts = explode(';', $contentDisposition);
foreach($parts as $p)
if(stripos($p, 'filename') !== FALSE) {
$kv = parse_ini_string($p);
return $kv['filename'];
}
Upvotes: 1
Reputation: 5521
I haven't seen any simple solution that works with all the variations mentioned here and on similar questions. Here is my solution to accomplish that.
<?php
preg_match('/filename(\*)?=(UTF-8\'\')?"?([^";]+)"?;?/', $_SERVER['HTTP_CONTENT_DISPOSITION'], $matches);
$file_path = urldecode($matches[3]);
?>
That's it, but since we are probably going to write the file to disk, we can add some path sanitization and get the path parts.
$regex_array = array(
'/\.{2,}\//', //prevents changing directory to parent directories
'/^\/+/' //prevents using root directory or absolute path
);
$path_parts = pathinfo(preg_replace($regex_array, '', $file_path));
echo $path_parts['dirname'];
echo $path_parts['basename'];
echo $path_parts['extension'];
echo $path_parts['filename'], "\n";
Here are some examples.
Input strings:
Output file paths with directory sanitization.
Upvotes: 1
Reputation: 2195
After several tries I understood (thanks, Julian Reschke) that it's not possible to implement the logic of a header value parsing correctly according to RFC 2616 in a simple function. Also it requires a lot of tests to ensure the implementation is correct.
Providing that there is still no solution for that I decided to publish the dedicated library with ContentDisposition
class.
composer require cardinalby/content-disposition
It can both generate/format a value and parse a string. The example of parsing:
use cardinalby\ContentDisposition\ContentDisposition;
$cd = ContentDisposition::parse('attachment; filename="plans.pdf"');
assert($cd->getType() === 'attachment');
assert($cd->getFilename() === 'plans.pdf');
assert($cd->getParameters() === ['filename' => 'plans.pdf']);
use cardinalby\ContentDisposition\ContentDisposition;
$cd = ContentDisposition::parse(
'attachment; filename="EURO rates.pdf"; filename*=UTF-8\'\'%E2%82%AC%20rates.pdf'
);
assert($cd->getType() === 'attachment');
// Unicode version is preferable
assert($cd->getFilename() === '€ rates.pdf');
assert($cd->getParameters() === [
'filename' => 'EURO rates.pdf',
'filename*' => '€ rates.pdf'
]);
Upvotes: 3
Reputation: 38512
Try simply using look behind
$str = "attachment; filename*=UTF-8''test.rar";
preg_match('/(?<=\')[a-z-A-Z0-9 -,.()%]*/', $str, $matches);
print_r($matches);
DEMO : https://www.regex101.com/r/yO9nQ4/1
Upvotes: 0
Reputation: 6047
you need to give some more info. is the first part always the same? is the filename always at the end, right after '' ?
--edit--
if you just need to remove the first part then don't use regexp
$str = "attachment; filename*=UTF-8''test.rar";
$filename = substr($str, 29);
Upvotes: -3
Reputation: 59701
This should work for you:
<?php
$str = "attachment; filename*=UTF-8''test.rar";
preg_match_all("/\w+\.\w+/", $str, $output);
echo $output[0][0];
?>
Output:
test.rar
EDIT:
If the 2 single quotes are every time in the string you can grab every thing after with:
<?php
$str = "attachment; filename*=UTF-8''Epost%20-test.part01.rar";
preg_match_all("/[^\'\']+$/", $str, $output);
echo $output[0][0];
?>
Output:
Epost%20-test.part01.rar
Upvotes: 2