Although WYSIWYG [1] web page editors make it much easier for the general user to create and maintain web content, there are times when I just want to be able to create and edit some HTML directly.

The simplest option, using a word processing application to export HTML, is a bad thing; the underlying paradigm of word processors is antithetical to good HTML. Fortunately there are alternatives such as the composer component of the Seamonkey project [2].

Why bother

To create web content that renders your design appropriately on HTML displays requires that you can mentally translate between the results you want and the source HTML required to produce it (if not in fine detail, at least in concept).

Many WYSIWGY editors allow you to edit HTML tags directly to make specific adjustments to your output, but unless you have a working knowledge of HTML (and CSS syntax for style sheet) this sort of direct editing is a daunting prospect.

Word processor mind-set

The problem seems to be that word processors (understandably) try to produce HTML that looks as much like the original document as possible, with an assumption that every visual element in the document is important, rather than just the default. So things like font, paragraph spacing and margins, which the author may not have even considered in the document, are reproduced in the HTML with the associated tag pollution.

This is a fundamentally different approach to markup languages (like HTML) which describe the content and expect different display devices to handle the presentation details according to their individual capabilities and user preferences.

Web pages

As a test, here are the outputs of an HTML page with a simple 2 x 2 table created in various editors using the table defaults and no particular formatting (other than applying a border where the default is to omit one).

Raw HTML

raw html sample

MS Word [3]

MS word sample

OpenOffice [4]

OpenOffice sample

Seamonkey composer

Seamonkey sample

Visually, the main difference is in the table width. The raw HTML and Seamonkey tables have no default width, so they render with the width of the elements. OpenOffice defaults to 100% width. MS Word defaults to the page width (although the width is actually specified in the cells).

The border and spacing differences could have been addressed by applying formatting in the various editors (or setting application defaults).

More troubling is the difference in file size produced:

Editor

File size (bytes)

Plain HTML

117

Seamonkey

442

OpenOffice

1,296

MS Word

22,928

This is not a particularly fair comparison. These size differences will vary with different content but the basic point is that word processors produce ugly HTML.

Look upon your works and weep

Most browsers and email programs have a 'View source' command that lets you see the raw HTML of the current page or message. This option is often available from a right-click context menu in browsers, but you might have to hunt it down in email programs.

Control-U generally works in applications using the Gecko layout engine (Firefox, Thunderbird, Seamonkey)

Right-click/View Source (and variations) works in Safari, IE, Outlook, Firefox and Seamonkey

In Thunderbird menus, View/Message source works.

In Outlook 2010, with the message open in a separate window:

  • Right-click/View Source shows the html body (but not headers)
  • File/Info/Properties to show internet headers (but not body)

Why does it matter?

As well as avoiding the aesthetically offensive HTML produced by word processors, direct editing lets you craft your HTML to do exactly what you want, not just what the WYSIWYG interface lets you.

What to do next

Get Seamonkey [1]

  • Has both WYSIWGY and HTML source editors operating on the same file.
  • Uses the same layout engine as Firefox, so you have a start on cross-browser testing.
  • Has immediate HTML preview tab so you can see the rendered result without having to save the file, switch to a browser and reload the page.

HTML source editing

This is the HTML that produces the comparison shown above.

Preview editing

Note the formatting buttons in the toolbar.

References

[1] WYSIWYG Wikipedia entry with some alternatives

[2] Seamonkey Project home page

[3] Microsoft Word part of the MS Office suite.

There used to be a downloadable 'Office HTML Filter' for Word 2000 which apparently removed much of the extra HTML cruft created by MS Word, but it is no longer available.

[4] OpenOffice home page