Create Word document using the OpenXML 2.0 SDK

** A French version of this article is available here**

Today, users should be able to export date from a system and document’s generation is a big problematic in a project.

In this article I’ll try to explain how to use the last version of the OpenXML SDK (2.0 April 2009 CTP) in order to generate Word 2007 documents (.docx).

Before that, let’s have a look at the document’s structure.

Structure of a “.docx”

A Word 2007 document is nothing else than a package that contains other files. Create a new document in Word and write just one word like that :

OpenXML01

Save the document and extract it to access the contained files :

OpenXML02

The major files in this packages are :

  • document.xml that contains all the structure and content of the document as you can see in the snippet bellow :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document><!-- xmlns tronqués pour lisibilité -->
  <w:body>
    <w:p w:rsidR="00AA5D53" w:rsidRDefault="00DF5AC3">
      <w:r>
        <w:t>Paragraphe</w:t>
      </w:r>
    </w:p>
    <w:sectPr w:rsidR="00AA5D53" w:rsidSect="00AA5D53">
      <w:pgSz w:w="12240" w:h="15840"/>
      <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"
        w:header="720" w:footer="720" w:gutter="0"/>
      <w:cols w:space="720"/>
      <w:docGrid w:linePitch="360"/>
    </w:sectPr>
  </w:body>
</w:document>

All these tags are identified by classes inside the OpenXML SDK. By installing it, you’ll access a new library name “DocumentFormat.OpenXml.dll” that contains namespaces specialized in the manipulation of Office Documents. For example, you will use the DocumentFormat.OpenXml.Wordprocessing namespace for Word documents.

Here is an equivalence between tags and .NET classes :

  • <w:document> : it is identified by the “Document” class and it represents the document’s root.
  • <w:body> : it is identified by the “Body” class and it represents the document’s body.
  • <w:p> : it is identified by the “Paragraph” class and represents a paragraph in your document.
  • <w:r> : it is identified by the “Run” class. You can’t add directly content in a paragraph. You should add a run and add content in the run (text, hyperlink…)
  • <w:t> : it is identified by the “Text” class and represents you text.
  • styles.xml : this file is like a CSS stylesheet in a web application. It contains all the definitions of styles that can be used in document. Styles are referenced by a unique ID and can be used directly in the document.xml file.

Note : writing OpenXML files with .NET is very simple and intuitive. It’s like write HTML code or XML code but with different tags. Always remember that you should have a Text inside a Run inside a Paragraph to write some text.

Create a simple Document

To write a more attractive article I choose a concrete case : write the orders history of each customer from the Northwind company. Here is the DataContext I’ll use in the following :

OpenXML03

First, you should add two references to your project. The first one, “WindowsBase.dll” allows you to implicitly use the System.IO.Packaging namespace that contains all classes allowing the creation of Open XML package. The second one, “DocumentFormat.OpenXml.dll” allows you to use all the classes for creating a Word document.

The first class to use is WordprocessingDocument that represents the “.docx” package. You can create an instance of this class by calling its “Create” static method :

WordprocessingDocument document = WordprocessingDocument.Create(
    @"C:\CustomersHistory.docx",
    WordprocessingDocumentType.Document
);

Now, you should compose the package. The first part that you have to create is the “document.xml” file. For that, call the AddMainDocumentPart method on the WordprocessingDocument instance :

MainDocumentPart mainDocumentPart = document.AddMainDocumentPart();

You can create the <w:document> root :

mainDocumentPart.Document = new Document();

And the document’s body :

Body documentBody = new Body();
mainDocumentPart.Document.Append(documentBody);

You’ve just achieve to create the document’s structure. Now you can get the customers from the database and write the content of the document :

List<Customer> customers = null;

using (NorthwindDataContext db = new NorthwindDataContext())
{
    DataLoadOptions options = new DataLoadOptions();
    options.LoadWith<Customer>(c => c.Orders);

    db.LoadOptions = options;

    customers = db.Customers.ToList();
}

The first paragraph is the title of the document. Remember that for creating a paragraph, you should create three instances :

  • Paragraph
  • Run
  • Text

Paragraph titleParagraphe = new Paragraph();
Run titleRun = new Run();
Text titleText = new Text("Northwind's Customers History");
titleRun.Append(titleText);
titleParagraphe.Append(titleRun);

Once the paragraph is create, you can add it to the document’s body by calling the “Append” method of the Body class :

documentBody.Append(titleParagraphe);

After that, I choose to create a paragraph for each customer and write a table that summaries all its orders :

foreach (var customer in customers)
{
    //paragraph for the name
    Paragraph customerNameParagraph = new Paragraph();
    Run customerNameRun = new Run();
    Text customerNameText = new Text(
        String.Format("#{0} : {1}",
            customer.CustomerID,
            customer.ContactName
        )
    );

    customerNameRun.Append(customerNameText);
    customerNameParagraph.Append(customerNameRun);

    //add the paragraph to the document’s body
    documentBody.Append(customerNameParagraph);

    //create a table :
    Table ordersTable = new Table();

    //loop on the orders
    foreach (var order in customer.Orders)
    {
        //création d'une ligne
        TableRow orderRow = new TableRow();

        //create a cell for the OrderID
        //notez que l'on retrouve l'imbriquation para/run/texte
        TableCell orderIDCell = new TableCell();
        orderIDCell.Append(
            new Paragraph(
                new Run(
                    new Text(order.OrderID.ToString())
                )
            )
        );

        //create a cell for the date
        TableCell orderDateCell = new TableCell();
        orderDateCell.Append(
            new Paragraph(
                new Run(
                    new Text(order.OrderDate.Value.ToShortDateString())
                )
            )
        );

        //add cells to the row
        orderRow.Append(orderIDCell, orderDateCell);
        //add the line to the table
        ordersTable.Append(orderRow);
    }

    //add the table to the document’s body
    documentBody.Append(ordersTable);
}

The last step is to save and dispose the WordprocessingDocument :

document.MainDocumentPart.Document.Save();
document.Dispose();

If you run this code, you’ll obtain a Word’s document that looks like the following screen :

OpenXML04

It will be more pretty with different styles :)

Styles management

There is two solutions for using styles in an Open Xml document. The first one is to extract a style sheet from a document to get a base. For example, you can extract all the Office 2007 styles from the document you’ve created when discussing about the structure.

When you’ve get the “styles.xml” file, add it to your project and configure it as it’s shown here :

OpenXML05

Now, you should add a new part to the Open Xml package. To do that, you will use the AddNewPart<T> generic method of the MainDocumentPart class. Here T is the StyleDefinitionsPart type :


StyleDefinitionsPart styleDefinitionsPart =
    mainDocumentPart.AddNewPart<StyleDefinitionsPart>();

Get a file stream on the styles.xml file. You’ll use it to load all styles in your package :


FileStream stylesTemplate =
    new FileStream("styles.xml", FileMode.Open, FileAccess.Read);
styleDefinitionsPart.FeedData(stylesTemplate);
styleDefinitionsPart.Styles.Save();

To apply a style to a paragraph, you should use a ParagraphProperties instance. For example, if I want use the “Title” style for the title of the document, I’ll add this code :

ParagraphProperties titleProperties = new ParagraphProperties();
titleProperties.ParagraphStyleId = new ParagraphStyleId { Val = "Title" };

titleParagraphe.Append(titleProperties);

If I want use the “Heading1” style for the name of each customer, I’ll add this code :


ParagraphProperties customerNameProperties = new ParagraphProperties();
customerNameProperties.ParagraphStyleId =
    new ParagraphStyleId { Val = "Heading1" };
customerNameParagraph.Append(customerNameProperties);

Here is the result :

OpenXML06

The second solution for using styles is to create your own style sheet directly. Here is a description of each steps to follow to create your own style sheet :

  • Create the “styles.xml” file by calling AddNewPart<T>
  • Instanciate the collection of style on the StyleDefinitionsPart :

stylePart.Styles = new Styles();

  • Create a style and add it to the collection :

Style dateStyle = new Style();
dateStyle.Append(new Name { Val = "DateFormat" });

RunFonts font = new RunFonts();
font.Ascii = "Calibri";//police

RunProperties runProperties = new RunProperties();
runProperties.Append(font);//font
runProperties.Append(new Color() { Val = "006600" });//couleur
runProperties.Append(new Italic());//italic
runProperties.Append(new FontSize { Val = 32 });//taille 32

dateStyle.Append(runProperties);

stylePart.Styles.Append(dateStyle);

  • Use the style

As you saw previously, you have to use a ParagraphProperties :

ParagraphProperties dateProperties = new ParagraphProperties();
dateProperties.ParagraphStyleId = new ParagraphStyleId
{
    Val = "DateFormat"
};

And add these properties to the paragraph :

TableCell orderDateCell = new TableCell();
orderDateCell.Append(
    new Paragraph(
        dateProperties,
        new Run(
            new Text(order.OrderDate.Value.ToShortDateString())
        )
    )
);

Result :

OpenXML07

The Open Xml SDK is a very powerful tool that allows to work with Office 2007 formats. Here we’ve discussed about the creation of Word files, but it’s also possible to read files using the same classes and manipulate other formats like pptx or excel, for example.

You can find the source code here.

Bye.

Julien Corioland

C#