The Markout Story

I want to explore a little thought experiment about how to translate a document into an HTML web page, but without any syntax or semantics.

It may sound a little crazy at first, but bear with me, it has a good ending. I have made up a little story to keep it interesting. Here goes.

One morning Peter walked in and asked me if I could translate his document into a web page. Well I thought to myself, I know lots of markup languages inside out, HTML, Markdown, and many others, so how hard can it be. Sure I'll take a look.

"Great" says Peter, "I have used Markout! It gets rid of all the that messy markup stuff...".

Ah! I don't know that one, so I ask him, "What's the Markout syntax?".

"Oh" says Peter, "Markout is my own invention, it does not have any syntax or semantics, I just wrote what I wanted to say in a plain text editor".

This sounds like mission impossible! I try to explain to Peter that without any syntax I won't be able to know which lines are headers to be in a big font, and so on, and so on.

"But just take a look" says Peter, "it's easy to read, it say's what I want and it means what I say".

Well yes, I can see it is easy to read, and I see that the first half is in English and the second half is in French. Yes, this is Canada, and that's a translation of the same thing.

Peter says, "Its just plain text, why can't you just translate it into a web page?"

Well I suppose I could transliterate it into HTML, "But Peter, it would look just like your plain text document".

"Well that's a good start" says Peter, "at least I can publish it on the web and people can read it".

"Well OK, that's easy enough, come back tomorrow and you can take a look. I will translate it verbatim, including the format, blank lines, indents and all".

Markout Take One

Sure enough the transliterator is a tiny program, easy to write and specify, and it transliterates Peter's document into an HTML document in a snap. It is easy to read as a web page, but I don't expect Peter to be very happy about the presentation, it looks just like the original plain text.

It turns out that Peter has written each paragraph as a long line in his text editor, using word warp. That's good, each line is now in an HTML block, and each block looks just like an HTML paragraph. Each paragraph is separated from the next paragraph by a blank line.

I can change the page width and the paragraph text flows into lines that fit the available page width. In a few places Peter has broken a paragraph into separate lines, but that's ok, it still looks like the original, and the start of each line starts at the left margin as expected.

Of course I had to do a little fiddling, the HTML white-space handling is rather quirky, and the format is easily lost. But HTML and CSS style sheets provide plenty of options to solve these problems, so with a little care my transliterator can preserve indents and blank lines.

Next day Peter takes a look. He is happy that he can publish his document as a web page, and it is easy to read. It is a great start, but as I expected he really would like to be able to spiff it up a bit...

I tried to explain again, without any syntax how could I do that. Peter has an idea, he has written 'Section: ...' at the start of each section header line, so he ask: "Why couldn't I use his first word tags to format the section headers?"

"Yes Peter, if you can write down what you have done then that will give me the syntax, and I can expand my transliterator and make it into a translator with those lines as HTML header elements."

Peter's face falls, "Yes" he says, "I could do that, but the French section has different words, and I also have a draft of a book I am writing, and that says 'Chapter: ...', and then there are some other technical notes, and they use ## at the start of a section header line, and ### for a subsection, and so on".

He laughs, "Isn't Markout great, I can write it just the way I want!"

Markout Take Two

To try and use Peter's tag words I wrote a little JavaScript transformer, running in the browser. It looks at each line and reads the first word so that it can put the header lines into a large bold font, or whatever Peter wants.

In fact the JavaScript only needs to mark the lines with the first word tag so that a CSS style rule can be applied. Peter is very happy: "Marvellous! Please, can you do that for me".

It was easy, JavaScript has lots of helpers like jQuery to deal with the DOM (Document Object Model) that represents the HTML as a data structure. I just had to check the first word of each line against Peter's key words and put these key words into a class attribute in the line. The CSS style rules could then define how the lines are presented.

Then I realized that the JavaScript didn't need to know what words Peter had used, it can simply write the first word in every line into the class attribute name for that line. Lots of lines start with words that Peter has not intended to have any significance in his Markout notation, but so what, there are no CSS style rules for these class attribute words, so they are presented without any style rules.

Now the same little JavaScript can be used for all Peter's documents, we just need some style rules for each document with rules for whatever first words Peter has used as tag words.

But there is a wrinkle: some of Peter's tag words should appear in the presentation, for example a Chapter: first word tag, but most of Peter's tag words, such as ## on a header line, should not appear in the presentation. So in addition to using the first word as a class attribute value in the HTML element the first word itself needs to be wrapped in an HTML element so that the CSS rules can choose to hide it, or not.

This is so much better than trying to translate Peter's Markout tags into HTML, the transliterator is nice and simple, and it does not need to change regardless of the words Peter has chosen to use a tags. The transliterator is still the same as it was for Markout Take One.

Then I realized that I could move the JavaScript running in the browser back into the transliterator. It could mark the class of each line with the first word as it generated the HTML. Peter can use the same little transliterator for all his documents. Or is it a translator now? In any case, all Peter needs is a CSS style sheet for each document.

Peter is delighted, he thinks this is great, and he has spotted lots of other things he can do. He has block quote paragraphs that start with a > tag, and notes and asides and examples.

In the beginning was the word, and the word is all we need!

Markout Take Three

So Markout has no syntax or semantics, but we have found lots of semantic tags anyway -- the first word of any line may be used as a semantic tag for the CSS style sheet rules to use.

The first word tags can be easily extended to apply to indented blocks of text, which is great for showing examples of code, or verses of poetry, and lots of other things.

But hang on, won't all the first words of lines inside these inset blocks be flagged as semantic tags too? Peter is worried, "How does that make sense? I can't choose these words, and I want them presented as plain text, not as unexpected headers or other things".

No problem, the power of CSS style sheet rules can sort this out, the CSS rules can home in on elements nested in any context, and that allows us to present the same semantic tag word with a different style, depending on its context.

The essential point is that the transliterator has included all the source text, so a block of prose or verse can be presented verbatim in any style we like.

The style rules can get complicated, but a little more JavaScript can simplify things. It can modify or eliminate the class attribute values for all the first words in indented blocks.

The top level semantic tag is the first word on the first line above an indented block, and that is the tag word for all the lines in the indented block.

So using the first word in a line as a block tag can do almost anything Peter wants, but there is still a big problem. How can Peter write words in italics or bold, or any other semantic tags, when they are inside a line?

Well it turns out Peter has simply written these elements in brackets using the first word in brackets as his tag word. So he writes (em for emphasis), and (cite Alice In Wonderland) to cite a book. The first word inside brackets can be treated just like the first word in a line, it can be any semantic tag word that Peter wants to use.

So there it is, Markout has no syntax or semantics, but any first word (in a line or in brackets), may be used as a semantic tag. It is only the CSS style sheet rules that give the first words any meaning in the HTML presentation.

The Markout notation is very flexible since the CSS style sheet can treat any first word as a semantic tag. It also is a very robust, there is no real problem if some or all the CSS style rules are missing, all the original source text appears just as Peter choose to write it.

The original plain text has naturally been written so that it is easy to read, it just lacked a stylish presentation.

Lots of people love Markdown, and they will be thinking that Peter has just reinvented Markdown and called it Markout. But not so, in Markout you can use the same Markdown symbols as first word tags, they look fine. But in Markout the first word tags can be anything you like, and as many as you need.