, , , , ,

The sudden realization that the new MS Word format, .docx, is called Office Open XML for a reason made me spend the whole day in trying to figure out, how these XSL-transformations actually work and whether they could be used in converting these new .docx files to something more edi(ta)ble.

Turned out that the XSL transformations were in principle a pretty simple thing to do, just like a friend me had told. Here’s and example of how to convert a .docx file to LaTeX, in its crudes form:

First, you need to break open the .docx file. It basically is a simple zipped archive, so an ‘unzip testdoc.docx’ should do the trick; you’ll end up with several files and sub-directories, of which only the directory called ‘word’ is necessary for this test.

Second, here’s the XSL transformation to save in a file:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

<xsl:template match="/w:document">

<xsl:template match="w:body">

<xsl:template match="w:p">
<xsl:apply-templates/><xsl:if test="position()!=last()"><xsl:text>


<xsl:template match="w:r">
 <xsl:if test="w:footnoteReference"><xsl:text>\footnote{</xsl:text>
 <xsl:call-template name="footnote">
 <xsl:with-param name="fid"><xsl:value-of select="//@w:id"/></xsl:with-param>
 <xsl:if test="w:rPr/w:b"><xsl:text>\textbf{</xsl:text></xsl:if>
 <xsl:call-template name="pastb"/>
 <xsl:if test="w:rPr/w:b"><xsl:text>}</xsl:text></xsl:if>

<xsl:template name="pastb">
 <xsl:if test="w:rPr/w:i"><xsl:text>\textit{</xsl:text></xsl:if>
 <xsl:call-template name="pasti"/>
 <xsl:if test="w:rPr/w:i"><xsl:text>}</xsl:text></xsl:if>

<xsl:template name="pasti">
 <xsl:apply-templates select="w:t"/>

<xsl:template name="footnote">
 <xsl:param name="fid"/>
 <xsl:apply-templates select="document('footnotes.xml')/w:footnotes/w:footnote[@w:id=$fid]"/>

<xsl:template match="//w:footnote">
 <xsl:apply-templates select="w:p"/>


You can save that in a file called docxtolatex.xsl in the ‘word’ directory. Then, in that directory, run ‘xsltproc docxtolatex.xsl document.xml’, and you’ll have your screen full of the document, in LaTeX markup.

You’ll notice, that this XSLT only converts bold, italics and footnotes. But then again, that’s what I often only need to convert…