If you’ve ever needed to use ColdFusion to manipulate Word documents, you might have tried a 3rd party library like Doc4J or Apache POI.
While these libraries are very robust I found them to be limiting in different ways. Apache POI lacks built-in mail merge capability (which to me seems very odd given the information below) and Doc4j threw an odd Jakarta error that I was not able to fix.
I also looked at paid for libraries like Apose, but the licensing costs were just too prohibitive
Finally I decided on direct OOXML manipulation.
This turned out to be much easier than I anticipated, once I learned that Office files like Word docx files are actually Zip archives!
Who knew!?!?!
So it’s possible to use the native cfzip
tag to “open” the file.
<cfzip action="unzip" file="{Your Office Document File Path}" destination="{A temporary folder}" recurse="yes"/>
Once unzipped to a folder you will have a directory structure like this:
Within the Word
subfolder there are several XML files. The document I am manipulating for the mail merge is document.xml
This file can be read into ColdFusion using xmlParse()
and from there can be manipulated like any other XML object using ColdFusion’s native tags and commands.
I couldn’t find any “standard” way for altering the XML to merge data into the various template fields.
To get an idea of how Word does it I created a test Word document with a simple and complex mail merge field: Word Merge Fields.docx
Running this through a Word mail merge and then unzipping the resulting file and reviewing the document.xml for the “merged” document I found that simple merge fields (fldSimple
) are completely replaced with the merged value while complex fields have a begin
and end
XML node delimiter that must be parsed and manipulated in specific ways.
In order to properly merge complex fields it’s necessary to determine if there is a separator
XML node within the field. If so, the the node between the separator
node and the end
node are used as the value. If no separator
node is found then the entire field is replaced with the merged value just as simple merge fields.
After updating the parsed XML it must be written back to the document.xml
file:
<cffile action="write" file="{Path to document.xml}" output="#toString(CF XML object)#" charset="utf-8">
Once the document.xml
file has been saved we need to rezip everything back into a Word docx file:
<cfzip action="zip" file="{Full path to resulting docx file}" source="{Folder containing unzipped contents}" recurse="yes"/>
Fully merged document:
Word Merge Fields – Merged.docx
So far I’ve only used this for mail merges, but I’m sure that is only scratching the surface.
An information technology professional with twenty five years experience in systems administration, computer programming, requirements gathering, customer service, and technical support.
0 Comments