Reputation: 919
Recently, we had someone upload a file that contained illegal characters in the name (double hyphen) which resulted in the inability to redownload the file. In this instance the file name was
Some name -- some other information
For the upload, the file name gets set by getting the original file name which is a business rule.
file.setFileName(file.getFile().getOriginalFilename());
This resulted in the double hyphen becoming two upside down question marks, and for whatever reason resulted in the inability to retrieve the file back from the server.
I'm wondering if there is a programmatic solution to check the original file name for situation like this.
For transparency, here is the code for uploading the file:
public void saveOpcertCeuFile(OpcertCeuFileUpload file) {
UmdContact user = secUtilService.getActiveUser();
String username = user.getEmail();
Date now = new Date();
file.setCreatedTs(now);
file.setLastUpdatedTs(now);
file.setCreatedBy(username);
file.setLastUpdatedBy(username);
file.setFileName(file.getFile().getOriginalFilename());
file.setIsApproved(Boolean.FALSE);
file.setIsDeleted(Boolean.FALSE);
try {
file.setByteContents(file.getFile().getBytes());
} catch (Exception ex) {
log.info(ex);
throw new RuntimeException(ex);
}
dao.insertOpcertCeuFileUpload(file);
Path path = this.getOptcertCeuFilePath(file);
String configF = envService.getServerUrl();
file.setFilePath(String.valueOf(path));
dao.updateOpcertCeuFilePath(file);
try {
File file1 = path.toFile();
file1.getParentFile().mkdirs();
Files.write(path, file.getByteContents(), StandardOpenOption.CREATE_NEW);
} catch (Exception ex) {
throw new RuntimeException(ex);
}
}
Upvotes: 2
Views: 1113
Reputation: 35427
If you want to store files, name them according to whatever rules you want, but don't let the user dictate the name. Will there be name conflicts? Does a filename contain invalid characters? You never know.
So use your own naming conventions. But you say that there is some business rule to force you to keep the original filename. So just do that in another place.
For instance, you get the file Hello--World.txt
, use the name 20201124-000001.uploaded
on your filesystem, but then store in some metadata that the filename is Hello--World.txt
. When somebody wants to download that filename, just provide the original filename as the download. This way you keep the metadata associated to your filename, but you keep your system secure.
Example in your code:
// Name on filesystem.
file.setFileName(date + "-" + orderingNumberForDate(date) + ".uploaded");
// Name in the metadata (text or db)
file.setOriginalFileName(file.getFile().getOriginalFilename());
Upvotes: 2
Reputation: 102988
Use whitelisting.
Make a list of characters you find acceptable. Then run the input file name through a filter. Make a StringBuilder
and loop through each character:
If you want to get real fancy you could make a much more involved system that attempts to map any character onto a filesystem-valid character, e.g. trying to map é
onto e
, a non-breaking space onto nothing (no character at all, an empty string), 'ß' into 'ss', and more. But that doesn't sound like a worthwhile effort here, and is in many ways literally impossible ('ö' in german becomes 'oe', in swedish it becomes 'o'. How do you know the name in the file is swedish or german? You don't, so there is no foolproof conversion possible in the first place).
NB: You could put in some effort and figure out which characters are and aren't legal on the filesystem you're on. But then you still end up with files with a name that may be acceptable on the system you're on (and many systems accept almost everything, even real bizarre characters, because filenames are mostly just bags o' bytes, and the only reason you can't put a slash in there is because various tools will interpret it as a separator) - but is hard to move around, and causes issues because browsers don't think such characters are valid for their systems even if they are. Thus, I advise whitelisting only the simple characters: Letters, digits, underscore, maybe dollars, dots, dashes, and spaces.
Upvotes: 1