user2826075
user2826075

Reputation: 53

Extracting filename from content disposition via PHP

I need a regex to extract the filename (incl. file extension) from the following string:

attachment; filename*=UTF-8''test.rar

or like this

attachment; filename*=UTF-8''Epost%20-test.part01.rar

Target:

test.rar
Epost%20-test.part01.rar

How can I do this?

Note: I'm using preg_match for extracting

Upvotes: 4

Views: 7741

Answers (6)

Marco Marsala
Marco Marsala

Reputation: 2462

Assuming you have your Content-Disposition header as string in $contentDisposition, the trick is to use parse_ini_string:

$parts = explode(';', $contentDisposition);
foreach($parts as $p)
    if(stripos($p, 'filename') !== FALSE) {
        $kv = parse_ini_string($p);
        return $kv['filename'];
    }

Upvotes: 1

drolex
drolex

Reputation: 5521

I haven't seen any simple solution that works with all the variations mentioned here and on similar questions. Here is my solution to accomplish that.

<?php
preg_match('/filename(\*)?=(UTF-8\'\')?"?([^";]+)"?;?/', $_SERVER['HTTP_CONTENT_DISPOSITION'], $matches);
$file_path = urldecode($matches[3]);
?>

That's it, but since we are probably going to write the file to disk, we can add some path sanitization and get the path parts.

$regex_array = array(
  '/\.{2,}\//', //prevents changing directory to parent directories
  '/^\/+/' //prevents using root directory or absolute path
);
$path_parts = pathinfo(preg_replace($regex_array, '', $file_path));
echo $path_parts['dirname'];
echo $path_parts['basename'];
echo $path_parts['extension'];
echo $path_parts['filename'], "\n";

Here are some examples.

Input strings:

  1. attachment; filename="newdir/image%20'1.jpg"
  2. attachment; filename*=UTF-8''newdir/image%20'2.jpg;
  3. attachment; filename*=UTF-8''///../usr/bin....//newdir/image%20'3.jpg;

Output file paths with directory sanitization.

  1. newdir/image '1.jpg
  2. newdir/image '2.jpg
  3. usr/bin/newdir/image '3.jpg

Upvotes: 1

Cardinal
Cardinal

Reputation: 2195

After several tries I understood (thanks, Julian Reschke) that it's not possible to implement the logic of a header value parsing correctly according to RFC 2616 in a simple function. Also it requires a lot of tests to ensure the implementation is correct.

Providing that there is still no solution for that I decided to publish the dedicated library with ContentDisposition class.

composer require cardinalby/content-disposition 

It can both generate/format a value and parse a string. The example of parsing:

use cardinalby\ContentDisposition\ContentDisposition;

$cd = ContentDisposition::parse('attachment; filename="plans.pdf"');
assert($cd->getType() === 'attachment');
assert($cd->getFilename() === 'plans.pdf');
assert($cd->getParameters() === ['filename' => 'plans.pdf']);
use cardinalby\ContentDisposition\ContentDisposition;

$cd = ContentDisposition::parse(
    'attachment; filename="EURO rates.pdf"; filename*=UTF-8\'\'%E2%82%AC%20rates.pdf'
    );
assert($cd->getType() === 'attachment');
// Unicode version is preferable
assert($cd->getFilename() === '€ rates.pdf');
assert($cd->getParameters() === [
    'filename' => 'EURO rates.pdf', 
    'filename*' => '€ rates.pdf'
]);

Upvotes: 3

A l w a y s S u n n y
A l w a y s S u n n y

Reputation: 38512

Try simply using look behind

$str = "attachment; filename*=UTF-8''test.rar";

preg_match('/(?<=\')[a-z-A-Z0-9 -,.()%]*/', $str, $matches);

print_r($matches);

DEMO : https://www.regex101.com/r/yO9nQ4/1

Upvotes: 0

Alex
Alex

Reputation: 6047

you need to give some more info. is the first part always the same? is the filename always at the end, right after '' ?

--edit--

if you just need to remove the first part then don't use regexp

$str = "attachment; filename*=UTF-8''test.rar";

$filename = substr($str, 29);  

Upvotes: -3

Rizier123
Rizier123

Reputation: 59701

This should work for you:

<?php

    $str = "attachment; filename*=UTF-8''test.rar";

    preg_match_all("/\w+\.\w+/", $str, $output);

    echo $output[0][0];

?>

Output:

test.rar

EDIT:

If the 2 single quotes are every time in the string you can grab every thing after with:

<?php

    $str = "attachment; filename*=UTF-8''Epost%20-test.part01.rar";

    preg_match_all("/[^\'\']+$/", $str, $output);

    echo $output[0][0];

?>

Output:

Epost%20-test.part01.rar 

Upvotes: 2

Related Questions