Java Convert Word to PDF with UNO – Part 1

7 February 2015

Libre Office and Open Office are free Office tools providing, among other things, a programmatic interface (called the UNO API) to load, manipulate and save documents.

The steps below create a program to load a Microsoft Word document, make changes, and save it to PDF format.

Objective

We’ll use the code below to mail merge a DOC file to PDF. The process is:

  • set up your environment
  • initialise
  • load the document
  • substitute the data (mailmerge)
  • save as PDF
  • shutdown

This starting point will let you test all sorts of document conversions and mail merging scenarios.

Step 1 – Setup

We need to add the Libre Office (or Open Office if that’s what you’ve installed) JARs to our class path. These JARs give us access to the Java UNO API that we’ll be calling to do all sorts of magic. In your install of Libre Office, look for the following files and make sure they are in your project classpath:

You can Download the code and template to save time.

Step 2 – Starting the Office Process

Boot a Libre Office process that will listen to our requests.

Step 3 – Loading a Document

The code below loads a template into the Libre Office engine. Notice 2 things:

  1. It expects to find the template as c:/projects/letterTemplate.doc (so you should change this as required).
  2. The load process uses a “Hidden” flag. This can be set to false to see the process working.

Step 4 – Search and Replace

The search and replace looks for:
“<date>” and replaces it with the current date and time
“<addressee>” and
“<signatory”.

Step 5 – Export to PDF

The Libre Office filter name “writer_pdf_export” is used to save as a PDF document.

Step 6 – Shutdown

This terminates the process launched in step 2 above. Instead of terminating, more load, manipulation, and save processing could be done.

You can put all the above code together by copy-and-pasting, or you can download the Code and Template.

Gotchas 1 – Multithreading

It’s possible, but not advisable to use this approach in a multi-threaded fashion. Experience has shown that this leads to instability and unpredictable results. Of course you could launch multiple Libre Office processes to handle many requests, each in a single threaded manner.

Gotchas 2 – Process and Crash Management

Under a realistic workload, there are documents that can crash the process. This means your real-production-version of this approach would need to expect for the occasional failure, clean up and restart the process. Ideally this would all be transparent to the calling user or program.

Likewise, you want to make sure you nicely clean up any resources to use in cases of success and cases of failure. In this case we are spawning a separate process which is definitely something you always want to clean up.

Downloads / Resources

You can Download a Zip of this example.

There are many more examples of using the UNO API in the Libre Office SDK