Monday, March 22, 2021

Exporting LibreOffice Guides to XHTML (Part II)

In the previous blog post I explained the reasons and some issues on exporting LibreOffice Guides to the xHTML format. Now it is time to give more technical details. 

I choose to use the extension writer2xhtml available in the Extension website, because the produced HTML5 look less cluttered than the native XHTML export. Nevertheless there it will be necessary to add some extra HTML5 lines, to load the CSS and a Javascript file.

Invisible changes in chapter files that can go upstream

There are some changes that should go upstream, because it does not change the resulting PDF or ODT book layout.

Each image must be anchored “as character” in the document. The image becomes a character and must be single in the paragraph. The paragraph must be centered in the page, using a style that aligns in the center, for example, the “Figure” paragraph style. The image caption paragraph must have style “Caption”. The wrapping frame that holds the image and the caption must also be anchored “as character” in a paragraph with style “Figure” as well. This arrangement is transparent when producing ODT, PDF and HTML5 documents.

Tips, Notes and Cautions headings use graphics as bullet. Many of these paragraphs have the bullet enabled by direct formatting and this is invisible to the user in LibreOffice, but will show when exporting to HTML5 with an ugly black circle.

Recommended changes that should go upstream

Create a table style, or copy it from the table in the copyright section. Name the style as you want, custom table styles are stored in your user profile. Apply this style to all tables in the chapter. Open the table properties of each table and set alignment to “Centered”, and table width in 90%.

Remove cross references to pages. For example, “See figure 12 on page 30”. At best, use the “above” or “below” in the reference.

Add the sections described in 2,3 and 4 below.

Changes in the working copy of the chapter file.

The original chapter file is optimized for a book format, we need to prepare it for exporting to HTML5 where pagination is different, mostly by smooth scrolling. This involves steps but will not change the chapter contents, only the layout and some formatting

  1. With the images anchored as explained above – a step that may require manual work – move the caption above the image. This can be quickly done by placing the cursor in the caption text and pressing Ctrl+Alt+Up Arrow, to swap the caption paragraph with the image paragraph. Some chapter have dozens of images so it will be nice to have a script to bulk execute this swap.

  2. Wrap the top LibreOffice logo in a Section named “SEC_LOGO”.

  3. Wrap the Guide name in a section named “SEC_GUIDE”.

  4. Wrap the chapter title in a section named “SEC_TITLE”.

  5. Delete the existing table of Contents.

  6. Select the text from the copyright heading to the end of the chapter and wrap in a section named “SEC_DISPLAYAREA”. Ensure you leave some empty paragraphs after the end of the section.

  7. On the bottom of the chapter, after the SEC_DISPLAYAREA section, add 6 new empty sections: SEC_TOC, SEC_BOOK_TOC, SEC_SEARCH, SEC_IMPRINT, SEC_DONATION, SEC_NAV. You can create these 6 sections in an empty document and load it as Autotext, so all section can be inserted in a single command using Autotext (Ctrl+F3).

  8. Insert the chapter table of contents in the SEC_TOC section.

  9. Change the template of the chapter to the provided template odf2htmlv2.ott.

  10. Review the ordered and unordered lists in the chapter. The new template may highlight the spurious bullet and numbering inserted by direct format, and as explained above, is very hard to detect in the original ODT file. Some of these list direct formatting may also be detected in the HTML5 output.

Save your working copy.

Exporting to HTML5

The extension writer2xhtml adds a toolbar for exporting on a click. The extension allows some customization, not used here. The export used “original format” style and 115% font size.

The export is very fast and gives no choice to change the export name, so the exported file has same file name and html as file extension, overwriting existing files with same name and extension.

By default the exported file name opens in the system browser for inspection. The result is not yet what we want, we must apply specific CSS and Javascript for rearranging the layout of the sections. The se files are added in the HTML5 output juste before the </head> closing tag.

<link href="guideposition.css" rel="Stylesheet" type="text/css">
<link href="guideformats.css" rel="Stylesheet" type="text/css">
<script type="text/javascript" src="GS70.js" defer></script>

The CSS files

Two extra CSS files were created, one is guideposition.css and manages the sections position in the page, and has provision for handling other screen sizes such as in tablets. The second CSS file is guideformats.css that contains rules to override some attributes such as lists, fonts, font-size, colors, margins, padding and more of the sections.

The Javascript file

This file fills the empty sections we added at the end of the chapter. Contents for donation, guide table of contents (jump between chapters), a legal imprint, a search form (to be implemented) and more. The javascript file is common to all chapter and is custom to the guide.

Conclusion

Exporting the LibreOffice Guides to HTML is another way to offer a rich contents to the public. Guides in HTML format can be installed in servers of schools, libraries, colleges and corporations alongside with a PDF copy, to support a migration project.

The rich set of features of Writer, while allowing the creation of wonderful documents, is also source of concern not only when exporting to formats that are less flexible that ODF, but also to manage the excess of freedom. The changes recommended and the detection of hidden direct formatting in lists are examples. It becomes clear that a set of sanitizing scripts can help to remove spurious formatting, unused legacy styles, detect unwated extra styles and adjust the objects in the documents.

When handling the full set of guides it is easy to dream of an office suite that can execute some "wishes" like "anchor all images as characters and center in line", "change position of caption  in all frames to top", "format all tables with style 'guides' and align to center"... but that is for an office suite of the next generation!!!.

Partial results of all this work can be visualized below. 

The getting started guide in HTML format.

The javascript file

The css files: css1 and css2

The odf2xhtml.ott template

The writer2xhtml extension

 Happy documenting!!!!

 

 

Monday, March 1, 2021

Exporting LibreOffice Guides to HTML (Part I)

LibreOffice is an open source office suite full of tricky secrets. One of my favorites is the possibility to export a text document to XHTML or HTML5, both are W3C standards supported by most modern web browsers.

But you, the reader, will certainly ask: If I have the Guides in ODT and PDF file format why do I need another format? Why spend energy adding another medium for the LibreOffice Guides? 


There are advantages and drawbacks for the endeavor. On the thumbs up side, the community get a way to read the guides without actually downloading the PDF or ODT file and contents can be accessed with the browser's navigation tools (including bookmarking and more). One example is the current ODF Standard files exported to XHTML, available at the OASIS website.

A second advantage is that (X)HTML pages can be crawled and indexed by search engines robots and the LibreOffice Guides can be found on the search results pages of Bing, Google, DuckDuckGo and others.

Another exciting possibility for distributing the guides in (X)HTML format is that they could be installed on the intranets of schools, colleges and universities, public libraries, also community, public administration and private company websites. The files are static and don't need a server side scripting languages such as php or asp. Distributing the rich contents of the LibreOffice Guides in a browser readable format will add value to every LibreOffice migration project.

One critical factor in the success of a LibreOffice migration project is how quickly users can transition to the new software and having readily available, easily accessible documentation in different forms should not be underestimated.

How difficult is to convert the Guides to an (X)HTML format?

My experience is that there are some work to do in the ODT side, and some work on the exported (X)HTML. The nice part is that these changes are small and can be partially automated.

LibreOffice has an interesting XHTML export filter. The developers did their best to preserve formatting and document fidelity between different rich text output formats. A second tool I tried is the nice extension writer2xhtml, which also have interesting features.

However reading contents in a browser (or even a tablet and a mobile phone) requires scrolling instead of the usual page turning, as in a printed book.

The layout of the document's content must be adapted to the browser's navigation actions. This requires the layout to be adjusted for on-screen viewing. Besides, it is interesting to also adapt the contents to tablets and perhaps mobile phones.

Luckily, all elements for navigation exist in the ODT file, they are just in the wrong position when exported to XHTML. The approach is to wrap these elements in sections with specific names. After being exported to XHTML these sections are mapped in <div id="name">...</div> and can be accessed by both a CSS and Javascript for pagination and layout.

Here is one layout among many alternatives, for a simple export of our Guides to a browser page layout.

Besides the existing sections in the chapter, we can add other blocks with content of interest, for example a donation section a search form for either an external or internal search, such as Xapian and Omindex.

In the next post, I'll describe the changes needed in the Guide templates and discuss some of the alternate approaches for the task.

Stay tuned!