Why docxtemplater is so awesome to generate docx files

I’m creating a library on Github, which is obviously open-source. The library creates docx from a template with data, much like what you’re used to do for HTML with templating languages like Mustache, Twig (for PHP), HAML (Ruby and Node), Jade (Node). The library is called docxtemplater and is specifically for docx. Docx is a format used by Microsoft Word 2007+. The library was named docxgenjs at the beginning. But I’m opinionated and only want to generate docx from templates. So I have decided to rename the library to docxtemplater. I have also removed the JS suffix. In fact, the library is not deeply linked to javascript as there is a command line interface (CLI). The CLI makes it possible to use the library from any language.

They are two main differences between HTML and Docx. This differences make it difficult to use one of the mainstream templating languages for docx:

  1. Docx is a zipped format, so if you want a solution that works out of the box, you should at least add a way to unzip and then rezip the document. You will also have to write the logic to modify the right files.
  2. Docx is made of xml, and the text is splitted around in the document. For example writing « {tag} » in a word document often results in the following code structure in XML:


<w:t>{</w:t>...<w:t>tag</w:t>...<w:t>}</w:t>

This makes it almost impossible to use a « normal » implementation of a templating language.

The library can process variables, but also loops, conditions, custom parsing of variables (with the angular syntax), and replace images.

Why narrowing to a docx templating system ?

I have a big problem with all other libraries that try to solve similar problems: They all involve that you should write all the document’s content in your application code. This seems very inefficient to me. Why the heck should I write static paragraphs inside my code base ? It has been a long time now that the whole software industry agrees that it’s better to use templates to generate HTML rather than echo out stuff in your application. However, this specific problem seems to have been developed with old standards. Yes maybe, you might want to get some super power and insert a very complex thing in your document (eg something that is not some text nor some image). But even if you want that, I created a syntax that allows you to insert some custom XML: {@tag}. If you want to insert an XML tag that is not possible to generate using normal tags, use the XML Tag. But please, avoid making the same mistakes again, eg don’t put all your static content in your codebase, that’s exactly what templates are for !

Even better, if you use templates, you will find it much easier to switch from one output type to an other, because all of the output types will share the data (but obviously not the template).

There’s still one bonus using the templates: Everyone can edit Word templates, but only a developer can change code that echoes out part of the docx. So using docxtemplater rather than an other library will let you as a developer do the data-binding, and your manager/whoever can create the docx template.

9 réflexions au sujet de « Why docxtemplater is so awesome to generate docx files »

  1. After an afternoon spent on docxtemplater I agree with you: it is really awesome!
    But … I don’t undestand well how to numerate pages.
    In my template.docx I’ve a footer with: Page {page} of {pages}.
    How to set properly if the exact number of pages is known only after render?
    (For example: in some pages I can put large photos and data, only data in others pages and so on).

    Another thing: I’ve tried to organize my template in section, like this:
    ——————————————————————————-
    {header.title}
    Dear {body.first_name} {body.last_name}
    {body.content}

    yours Sincerely,
    Diego

    {footer.timestamp} Page {page} of {pages}
    ——————————————————————————-

    but I’ve got ‘undefined’ in all fields.
    Removing ‘header.’, ‘body.’ and ‘footer.’ works well

    Any ideas/suggestion, especially for page numbering?

    Now, after got docx output, I’m trying to convert it in other formats (simple text, pptx, xlsx, pdf …) using LibreOffice with cmd line « soffice –headless –convert-to … output.docx »

    Any other ideas to accomplish this ?

    Thanks for your software!

    dieo

    1. Hi Diego, Sorry,

      I just saw your comment. Hope the response will still help you or that you found another solution. Sadly, they is no way to get the page number. I don’t think any of the other library does this. The reason behind this is the following: The number of pages depends on the client that is reading the docx: it might defer between Microsoft version 2010, 2007 or 2013 and also between the Mac and Windows Version (and also mobile version). The page count and number is not written inside the XML, but is calculated by the client. It would be very difficult to implement a page counter for that reason. There’s a question on SO about that particular point: http://stackoverflow.com/questions/18479354/convert-docx-page-1-at-a-time-to-an-image

      However, If you’re putting your page numbers in the footer, I think that you have an option to tell word to write the page number on it.

      For the dot syntax (eg {body.first_name}), it is not available in the base parser, but you can use the angular parser for this : See the docs about that point: http://docxtemplater.readthedocs.org/en/latest/configuration.html?highlight=angular#custom-parser

      For the conversion, I can only think of pandoc, but I don’t know any better solution (or libreoffice which is ok too)

  2. I have been trying to use your template within SharePoint. I need to use sharepoint code and the templater on the same page which requires the page to be an ASPX page vice HTML. I can use your templater without error in an HTML page, but ASPX breaks the JSzip features I think.

    The error I get is End of data reached when attempting to read in documents. I am sure there is some type of conflict between the sharepoint scripts and your code in main.min.js.

    Curious if you have an example of your code running on a sharepoint server?

  3. Hi,
    Thanks for awesome js lib. I have an issue with table of contents, do you have any solution to update TOC automatically ?
    Thanks you.

    1. Hi Linh, I don’t think it is possible to update the TOC automatically, it would probably require a docxtemplater module to do that, but I don’t think it would be that simple.

  4. Hello good day,
    First I want to thank you for your work with the module was a great help to me.
    My question is that when the correction generated a docx language and the language if you default on the generated document is in French and want to know if there is any way to change this to another language, for example Spanish, etc.
    Thank you.

    1. Hi jposltre, I don’t think I understand what you’d like to achieve. I don’t see exactly how docxtemplater could do something regarding translation. however, you could easily use your own i18n by using a custom parser for example, that would output different words depending on the chosen language.

  5. Hi Edgar. I’m getting this error when trying to use your library. docxtemplater-latest.js:5 Uncaught SyntaxError: Unexpected token <
    FileSaver.min.js:5 Uncaught SyntaxError: Unexpected token <
    jszip-utils.js:5 Uncaught SyntaxError: Unexpected token <

    Any idea what am I missing?

    Thanks in adavance

Laisser une réponse

Votre adresse de messagerie ne sera pas publiée.