In the previous blog post I explained the reasons and some issues on exporting LibreOffice Guides to the xHTML format. Now it is time to give more technical details.
I choose to use the extension writer2xhtml available in the Extension website, because the produced HTML5 look less cluttered than the native XHTML export. Nevertheless there it will be necessary to add some extra HTML5 lines, to load the CSS and a Javascript file.
Invisible changes in chapter files that can go upstream
There are some changes that should go upstream, because it does not change the resulting PDF or ODT book layout.
Each image must be anchored “as character” in the document. The image becomes a character and must be single in the paragraph. The paragraph must be centered in the page, using a style that aligns in the center, for example, the “Figure” paragraph style. The image caption paragraph must have style “Caption”. The wrapping frame that holds the image and the caption must also be anchored “as character” in a paragraph with style “Figure” as well. This arrangement is transparent when producing ODT, PDF and HTML5 documents.
Tips, Notes and Cautions headings use graphics as bullet. Many of these paragraphs have the bullet enabled by direct formatting and this is invisible to the user in LibreOffice, but will show when exporting to HTML5 with an ugly black circle.
Recommended changes that should go upstream
Create a table style, or copy it from the table in the copyright section. Name the style as you want, custom table styles are stored in your user profile. Apply this style to all tables in the chapter. Open the table properties of each table and set alignment to “Centered”, and table width in 90%.
Remove cross references to pages. For example, “See figure 12 on page 30”. At best, use the “above” or “below” in the reference.
Add the sections described in 2,3 and 4 below.
Changes in the working copy of the chapter file.
The original chapter file is optimized for a book format, we need to prepare it for exporting to HTML5 where pagination is different, mostly by smooth scrolling. This involves steps but will not change the chapter contents, only the layout and some formatting
With the images anchored as explained above – a step that may require manual work – move the caption above the image. This can be quickly done by placing the cursor in the caption text and pressing Ctrl+Alt+Up Arrow, to swap the caption paragraph with the image paragraph. Some chapter have dozens of images so it will be nice to have a script to bulk execute this swap.
Wrap the top LibreOffice logo in a Section named “SEC_LOGO”.
Wrap the Guide name in a section named “SEC_GUIDE”.
Wrap the chapter title in a section named “SEC_TITLE”.
Delete the existing table of Contents.
Select the text from the copyright heading to the end of the chapter and wrap in a section named “SEC_DISPLAYAREA”. Ensure you leave some empty paragraphs after the end of the section.
On the bottom of the chapter, after the SEC_DISPLAYAREA section, add 6 new empty sections: SEC_TOC, SEC_BOOK_TOC, SEC_SEARCH, SEC_IMPRINT, SEC_DONATION, SEC_NAV. You can create these 6 sections in an empty document and load it as Autotext, so all section can be inserted in a single command using Autotext (Ctrl+F3).
Insert the chapter table of contents in the SEC_TOC section.
Change the template of the chapter to the provided template odf2htmlv2.ott.
Review the ordered and unordered lists in the chapter. The new template may highlight the spurious bullet and numbering inserted by direct format, and as explained above, is very hard to detect in the original ODT file. Some of these list direct formatting may also be detected in the HTML5 output.
Save your working copy.
Exporting to HTML5
The extension writer2xhtml adds a toolbar for exporting on a click. The extension allows some customization, not used here. The export used “original format” style and 115% font size.
The export is very fast and gives no choice to change the export name, so the exported file has same file name and html as file extension, overwriting existing files with same name and extension.
By default the exported file name opens in the system browser for inspection. The result is not yet what we want, we must apply specific CSS and Javascript for rearranging the layout of the sections. The se files are added in the HTML5 output juste before the </head> closing tag.
<link href="guideposition.css" rel="Stylesheet" type="text/css"> <link href="guideformats.css" rel="Stylesheet" type="text/css"> <script type="text/javascript" src="GS70.js" defer></script>
The CSS files
Two extra CSS files were created, one is guideposition.css and manages the sections position in the page, and has provision for handling other screen sizes such as in tablets. The second CSS file is guideformats.css that contains rules to override some attributes such as lists, fonts, font-size, colors, margins, padding and more of the sections.
The Javascript file
This file fills the empty sections we added at the end of the chapter. Contents for donation, guide table of contents (jump between chapters), a legal imprint, a search form (to be implemented) and more. The javascript file is common to all chapter and is custom to the guide.
Conclusion
Exporting the LibreOffice Guides to HTML is another way to offer a rich contents to the public. Guides in HTML format can be installed in servers of schools, libraries, colleges and corporations alongside with a PDF copy, to support a migration project.
The rich set of features of Writer, while allowing the creation of wonderful documents, is also source of concern not only when exporting to formats that are less flexible that ODF, but also to manage the excess of freedom. The changes recommended and the detection of hidden direct formatting in lists are examples. It becomes clear that a set of sanitizing scripts can help to remove spurious formatting, unused legacy styles, detect unwated extra styles and adjust the objects in the documents.
When handling the full set of guides it is easy to dream of an office suite that can execute some "wishes" like "anchor all images as characters and center in line", "change position of caption in all frames to top", "format all tables with style 'guides' and align to center"... but that is for an office suite of the next generation!!!.
Partial results of all this work can be visualized below.
The getting started guide in HTML format.
Happy documenting!!!!