user5005768Himadree
user5005768Himadree

Reputation: 1427

Faster way to correctly extract file name from Content-Disposition header in c++

I want to extract all possible types of valid file name from "filename" attribute of Content-Disposition HTTP header like the following example:

Content-Disposition: attachment; filename="filename.jpg"
Content-Disposition: attachment; filename=file-2020-April.txt.vbs"

Moreover, sometimes file name have non ASCII characters and in such case the correct file name comes from "filename=*" attribute like the following example:(this just an example, not an actual data)

Content-Disposition: attachment; filename="??.txt"; filename*=UTF-8''日本.txt

I used the following string functions to extract only from filename="

string ContentDispositionHeader;
int startPos = ContentDispositionHeader.find("\"");
startPos++;
int endPos = ContentDispositionHeader.find_last_of("\"");
int length = endPos - startPos;
string filename = ContentDispositionHeader.substr(startPos, length);

However, I need to write code to manage both file naming case (normal and UTF-8). is there a faster way to extract file names easily.

Upvotes: 1

Views: 1175

Answers (1)

Azeem
Azeem

Reputation: 14627

I believe that you cannot get faster than O(n) where n = length of the header if that's what you are looking for. And, this is what you're already trying to do.

Following is an example that extracts the filenames from the headers in a similar fashion considering that the quotes are always present (refer to RFC 6266 for more on this); and, the UTF-8 format always follows the ASCII one if the latter is present. Moreover, there might be more cases that you need to take care of while parsing the header.

Here's the example (live):

#include <iostream>
#include <string>
#include <vector>
#include <utility>

// Filenames: <ASCII, UTF-8>
using Filenames = std::pair<std::string, std::string>;

Filenames getFilename( const std::string& header )
{
    std::string ascii;

    const std::string q1 { R"(filename=")" };
    if ( const auto pos = header.find(q1); pos != std::string::npos )
    {
        const auto len = pos + q1.size();

        const std::string q2 { R"(")" };
        if ( const auto pos = header.find(q2, len); pos != std::string::npos )
        {
            ascii = header.substr(len, pos - len);
        }
    }

    std::string utf8;

    const std::string u { R"(UTF-8'')" };
    if ( const auto pos = header.find(u); pos != std::string::npos )
    {
        utf8 = header.substr(pos + u.size());
    }

    return { ascii, utf8 };
}

int main()
{
    const std::vector<std::string> headers
    {
        R"(Content-Disposition: attachment; filename="??.txt"; filename*=UTF-8''日本.txt)",
        R"(Content-Disposition: attachment; filename*=UTF-8''日本.txt)",
        R"(Content-Disposition: attachment; filename="filename.jpg")",
        R"(Content-Disposition: attachment; filename="file-2020-April.txt.vbs")"
    };

    for ( const auto& header : headers )
    {
        const auto& [ascii, utf8] = getFilename( header );
        std::cout << header
                  << "\n\tASCII: " << ascii
                  << "\n\tUTF-8: " << utf8 << '\n';
    }

    return 0;
}

Output:

Content-Disposition: attachment; filename="??.txt"; filename*=UTF-8''日本.txt
    ASCII: ??.txt
    UTF-8: 日本.txt
Content-Disposition: attachment; filename*=UTF-8''日本.txt
    ASCII: 
    UTF-8: 日本.txt
Content-Disposition: attachment; filename="filename.jpg"
    ASCII: filename.jpg
    UTF-8: 
Content-Disposition: attachment; filename="file-2020-April.txt.vbs"
    ASCII: file-2020-April.txt.vbs
    UTF-8: 

Upvotes: 2

Related Questions