Reputation: 1427
I want to extract all possible types of valid file name from "filename" attribute of Content-Disposition HTTP header like the following example:
Content-Disposition: attachment; filename="filename.jpg"
Content-Disposition: attachment; filename=file-2020-April.txt.vbs"
Moreover, sometimes file name have non ASCII characters and in such case the correct file name comes from "filename=*" attribute like the following example:(this just an example, not an actual data)
Content-Disposition: attachment; filename="??.txt"; filename*=UTF-8''日本.txt
I used the following string functions to extract only from filename="
string ContentDispositionHeader;
int startPos = ContentDispositionHeader.find("\"");
startPos++;
int endPos = ContentDispositionHeader.find_last_of("\"");
int length = endPos - startPos;
string filename = ContentDispositionHeader.substr(startPos, length);
However, I need to write code to manage both file naming case (normal and UTF-8). is there a faster way to extract file names easily.
Upvotes: 1
Views: 1175
Reputation: 14627
I believe that you cannot get faster than O(n)
where n = length of the header
if that's what you are looking for. And, this is what you're already trying to do.
Following is an example that extracts the filenames from the headers in a similar fashion considering that the quotes are always present (refer to RFC 6266 for more on this); and, the UTF-8 format always follows the ASCII one if the latter is present. Moreover, there might be more cases that you need to take care of while parsing the header.
Here's the example (live):
#include <iostream>
#include <string>
#include <vector>
#include <utility>
// Filenames: <ASCII, UTF-8>
using Filenames = std::pair<std::string, std::string>;
Filenames getFilename( const std::string& header )
{
std::string ascii;
const std::string q1 { R"(filename=")" };
if ( const auto pos = header.find(q1); pos != std::string::npos )
{
const auto len = pos + q1.size();
const std::string q2 { R"(")" };
if ( const auto pos = header.find(q2, len); pos != std::string::npos )
{
ascii = header.substr(len, pos - len);
}
}
std::string utf8;
const std::string u { R"(UTF-8'')" };
if ( const auto pos = header.find(u); pos != std::string::npos )
{
utf8 = header.substr(pos + u.size());
}
return { ascii, utf8 };
}
int main()
{
const std::vector<std::string> headers
{
R"(Content-Disposition: attachment; filename="??.txt"; filename*=UTF-8''日本.txt)",
R"(Content-Disposition: attachment; filename*=UTF-8''日本.txt)",
R"(Content-Disposition: attachment; filename="filename.jpg")",
R"(Content-Disposition: attachment; filename="file-2020-April.txt.vbs")"
};
for ( const auto& header : headers )
{
const auto& [ascii, utf8] = getFilename( header );
std::cout << header
<< "\n\tASCII: " << ascii
<< "\n\tUTF-8: " << utf8 << '\n';
}
return 0;
}
Output:
Content-Disposition: attachment; filename="??.txt"; filename*=UTF-8''日本.txt
ASCII: ??.txt
UTF-8: 日本.txt
Content-Disposition: attachment; filename*=UTF-8''日本.txt
ASCII:
UTF-8: 日本.txt
Content-Disposition: attachment; filename="filename.jpg"
ASCII: filename.jpg
UTF-8:
Content-Disposition: attachment; filename="file-2020-April.txt.vbs"
ASCII: file-2020-April.txt.vbs
UTF-8:
Upvotes: 2