York Haskell Compiler: March 2008

It looks like this blog has been silent for more than a year ;) In the light of the first milestone reached recently by one of Yhc subprojects, the Javascript backend, I am happy to break the silence and put some personal historical notes as well as notes about the current status of the project I have been working on for more than 1.5 years.

1. The "crazy idea"

I was not definitely the first to have a thought to convert Haskell to Javascript. One that I can now recall was a piece of text on this page (by Jared Updike, dated March 2006):


------------------
...In addition I was thinking it would be cool to compile Haskell to
JavaScript so then you could actually write arbitrary things in
Haskell and run them in browsers...
------------------

There was a number of others that I lost track of. Anyway, the idea was in the air, and seeing other people expressing it prompted me that it was worth trying.

2. First experiments

I started my experiments with Nhc98 in June 2006. The intermediate form of Haskell program before it was turned to bytecodes was called PosLambda. It was not very easy to understand it, but some primitive things worked out. Later I noticed that the
Hajax page appeared on Haskellwiki (the link points to its earliest version) which referred to some of my developments. I started my own Wiki page, STG in Javascript, sort of response to the Hajax page. The "STG in Javascript" page is still there, showing the progress during that time (Summer 2006). Later, another developer (nickname Vir) picked up the topic, however his work is completely independent from mine.

3. Yhc core

Some time in September 2006 I learned (thanks Neil) about Yhc Core: an intermediate format that Yhc produces during compilation, similar by its idea to GHC Core, but much simpler and with more stable format. Plus, it has an externalized form, so in order to process it there was no need to alter the code of the Haskell compiler itself. These benefits were very attractive, and Yhc Core had much in common with PosLambda, so transition of my converter was relatively painless. In October 2006, the sources of the Javascript converter were checked into the Yhc darcs repo, so the Javasscript backend "officially" became a subproject of Yhc.

4. The Roman echo

On November 16, 2006, the first runnable example was published. It was a web page with an input field. When a decimal or Roman number was typed in, it responded with the same number converted to Roman or decimal (conversion code by Malcolm Wallace), plus approximate time that the conversion took. First results were perhaps terribly slow: time shown might be seconds, yet it worked.

5. DOM interface

It was absolutely necessary to find a way how to call DOM functions from Haskell. In the first Roman echo example, these calls were manually coded, but for more than one hundred of them defined by the Web Consortium, that did not seem acceptable. In addition, accurate representation of DOM methods' argument type information was critical, as one of Haskell's benefits in Web application design is strong static typing (as opposed to Javascript with no static typing at all).

Happily, the Web consortium provides DOM interface specifications in OMG IDL syntax. And there is an utility that is capable of understanding DOM IDL: H/Direct (by Sigbjorn Finne). The way H/Direct reflected IDL interface inheritance was simple, but not very open-ended: all sub-interfaces of an interface were presented as constructors of an algebraic data type, which meant that in order to add an instance of an interface, the whole thing needed to be recompiled as Haskell does not have extensible algebraic types.

So, the the Haskell code generator was stripped off from original H/Direct code while the IDL parser remained in place. The new Haskell code generator transformed the hierarchy of IDL interfaces into hierarchy of Haskell type classes. This is a more open-ended solution, as instances to classes may be defined elsewhere without changes to class definition itself.

Finally, DOM IDL interface definitions downloaded from the Web Consortium website were passed through the modified H/Direct, yielding tens of modules (each DOM interface was placed in a separate module), preserving the type information as much as it was possible (indeed, were DOM interfaces designed with much regard to strong static typing anyway?).

Of course, some manual coding still was necessary (mainly around the type class defaulting problem), but its amount is incomparably smaller than the amount of Haskell code generated from the DOM IDL definitions.

6. Unicode

Surprisingly, Javascript code running in Web browser has almost no access to Unicode character properties. Case (upper to lower, and the opposite) conversions are of course available, plus something can be done using regular expressions, but that's mainly it. No direct access to Unicode character category, no titlecase conversion, etc. Some logic was borrowed from Hugs and GHC where such information was provided as a lookup table (by character code ranges). This resulted in about ~70 kbytes added to the Javascript runtime included in every Web page generated, but at least essentially the same Unicode character properties API that is available in Hugs and GHC became available to Haskell programs running in Web browser.

7. Application coding paradigm

At the moment DOM bindings were complete, it became possible to call DOM functions directly (in continuation-passing style). So Haskell program looked similarly to Javascript, and this way of coding seemed to require too much effort. It was the same old sequential programming. CPS might be replaced by monadic interface, but this did not change the matter. Some functional-reactive paradigm was necessary.

The Fudgets library inspired some experiments. At the first attempt, the lowest-level layer of Stream Processors was successfully ported (tested on Firefox). Unfortunately, the Fudgets library itself showed some execution overhead, and finally when tested on MSIE, significant memory leaks.

Later, CPS Monad was tried out (instead of plain CPS). Same situation: works on Firefox, leaks and freezes on MSIE.

Plain CPS, though, never had such problems. So, after several months of experiments, it was finally decided to stay with plain CPS, and try to find a suitable solution to describe static Web page layout in mostly declarative way while leaving some degree of freedom for developers as to how to define the interactive (dynamic) part of an application.

In addition, pseudo-threads (in fact, coroutines) and message boxes for inter-thread communication were included in the runtime making it easier to develop interactive, event-driven applications.

8. Haskell web toolkit

It turned out that a relatively thin layer of combinators on top of DOM may solve the problem of defining the static layout of Web page. Most of ideas were borrowed from the HTML and its successor XHTML libraries (by Andy Gill, Bjorn Bringert).

Static structure of a Web page is described as a hierarchy of widgets, some (containers, that correspond to HTML elements with opening and closing tags) may nest other widgets, and some are terminals (leaves, like text labels or images). So, developing such static layout does not differ much from writing HTML, only these constructs are not indeed static: powerful functions like map and fold may be used to operate over multiple widgets at once.

Dynamics of the user interface can be added later, and it is almost completely orthogonal to static design. Special elements called activators expose the same interface as other widgets (that is, may be nested in containers), but their semantics includes runtime operations on their "parent" widgets such as handling user-induced events, on-the-fly style and content manipulation, or data exchange over XMLHTTP.

The name picked, "Haskell Web Toolkit" unintentionally resembles "Google Web Toolkit", but indeed, they both serve the same purpose: bringing in some language not supported natively by Web browsers as a language to program for Web browsers. Weights and sizes of the two toolkits are of course incomparable ... yet ;)

9. The first milestone

As the Javascript backend's code matured, the idea appeared to set up a web service for everybody to submit their Haskell source code for compilation and instant loading into their Web browsers. Such service was implemented, based on CouchDB as intermediate storage. The service was quietly opened for public testing on March 4, 2008. Having such service available was considered reaching the first milestone in the Javascript backend project.

So, what has been achieved since the first "Roman echo" demo was published:

speed of generated Javascript was significantly increased, partly due to improvements in the code generator, partly thanks to the Yhc Core optimizer which was developed in the same time when Javascript backend

a tool to automatically generate Haskell bindings from OMG IDL interface definitions was created. It is currently investigated whether Javascript documentation tools such as JsDoc could be used to generate OMG IDL definitions for existing Javascript libraries: this would make integration of them with Haskell much easier

programming paradigm for Haskell Web applications was defined. The Haskell Web Toolkit library is not large at the moment, but it will definitely be extended with new pieces as usage of Haskell in Web programming gets wider

a proof-of-concept live Yhc Web service was launched, so everybody interested may try Haskell in their browsers without installation of any software

10. Acknowledgements

Thanks to the Yhc Team for helping understand some internals of the compiler. Special thanks to Neil Mitchell who developed the Yhc Core format and the Core tools library. This helped a lot to improve the quality of Javascript generated by the backend. Some people contacted me privately and suggested some ideas or identified bugs. I am not putting names here, but feel free to identify yourselves in the comments.

And most of all: interested developers are welcome! The more people contribute in this project, the more chance Haskell has to become wider used in Web applications whose importance grows over time.

York Haskell Compiler

Friday, March 28, 2008

Yhc/Javascript backend: personal/historical notes