Faisul
Faisul

Reputation: 29

How to Sanitize PDF with a opensource Java tool ( like PDFBox)?

I am trying to enhance security of a file upload segment in Spring based web application.

It uses a antivirus to do virus screening before upload, However it is additionally required to sanitize the files or restrict the files with active contents(javascript, autoaction) from being uploaded .

Allowed file formats are XLS, DOCX, PDF along with few image formats.

Upvotes: 0

Views: 1447

Answers (2)

god
god

Reputation: 297

The easiest solution would be to use https://github.com/freedomofpress/dangerzone - it basically renders input PDF (or makes it PDF first if necessary) and than output rendered pixels as new PDF.

The upside - no JS and other crap can survive the process. The downside - you loose clickable links, copyable text etc.

Upvotes: -1

Faisul
Faisul

Reputation: 29

For anyone who might need it later. This seems like a good starting point.

(Even though archived) DocBleach project has the implementation to detect malicious content and to sanitize the file. it also supports other office formats. It is build on top of pdfbox.

https://github.com/docbleach/DocBleach

Upvotes: 1

Related Questions