Friday, January 20, 2006

Yhc vs YHC

We have now named most things, which is a good step forward (Yhc, Yhi, Yhe, Gyhe, Yho ...)

One final thing we need to name is Yhc itself. If you look at the previous posts you'll see I tend to name it Yhc, and Bob tends to name it YHC. I personally prefer Yhc, because it looks quite a lot nicer, all the demo logos that have been done have had "hc" in lower case, and YHC seems a lot more violent, while Yhc is more friendly.

Thoughts and opinions?

Sunday, January 15, 2006

Wiki v2

I have started moving some bits of the old wiki to the new wiki, there is quite a lot of content, so if anyone wants to help :) I'm also thinking the Yhc website should just be the wiki, no static pages at all - so all the static content probably needs moving as well.

Hopefully once all this is done we'll have the best documented compiler.

Update: all done, there should be nothing on the old wiki which isn't on the new one - I also reorganised quite a bit at the same time.

Wednesday, January 11, 2006

Lines of code

For anyone who's interested, currently the Yhc compiler is 23578 lines of Haskell code. This is just the compiler, not the runtime etc.

Thursday, January 05, 2006

More on Type Checking

Christmas was spent mostly researching methods of type checking, and trying to get a good handle on what I'm going to do. After much looking, I now have apparently three options:
  • Write a standard type checker based on the W algorithm.
  • Write one based on Olaf's compositional method.
  • Attempt to refine Neil's idea for coming up with simple rules to be fed to a solver.

I don't like the first option - it's been done before, it'll give us no advantage over any other compiler, and it's boring. Olaf's method seems to have two major advantages - clear code with an easy to understand algorithm, excellent error messages/debugging. Meanwhile Neil's method seems to be the black horse - we don't know what we'll get out of it. From the first attempt at it, it appears that clear code will be one of the advantages, but speed may be lacking.

I think I'm probably going to go with Olaf's method first - that way we get excellent error messages guaranteed (one of my primary aims). However, I do think there's genuine scope in Neil's solver method, and I think a second experimental type checker may be written later, as I think it has much greater scope for being able to deal with really really complex type extensions.

So that's my thoughts for the moment. I need to speak to Olaf about a few details of his checker, and then I'll get writing.


Tuesday, January 03, 2006

User Interface

The Yhc compiler takes most of its Haskell-compiling code from nhc98, and most of its backend/interpretter has been rewritten by Tom. The one bit that I think needs most attention is the user interface. Specifically, I see the following problems in the user interface (where user is a programmer, and interface is how they use yhc):
  • YHC_BASE_PATH - this trips pretty much everyone up
  • Path handling, where .hbc files go etc
  • Doesn't list which file its compiling.
  • Error messages are usually pretty bad, compiling many files at a time has made this worse, since now you don't even know the file the errors in
YHC_BASE_PATH is scheduled for elimination. Listing compiled files is trivial. Error messages are being slowly replaced, but the lack of global state in Haskell is making this quite a bit harder.

The one thing where there is less concensus as to what the desired behaviour should be is path handling - i.e. deciding where to put a file, and where to get it from. I have started a wiki page with my thoughts on this, which is here. Please add your thoughts, hopefully we'll come to a nice concensus, and have a nicely implemented user interface before long.

Monday, January 02, 2006

HsNub - update

I just finished HsNub, and while it works basically, its not all that I was hoping for. The basic problem is that lots of functions which are entirely different have the same bytecode! For example, selecting the first element out of a data structure is given as lots of different names, often field selectors. To be useful, it would need to incorporate some knowledge of the types as well.

Just to show an example of it running, when trying it on my regular expression library, tweaked to add two functions func1 and func2 that are just like id, I get the result:

{Data.Char-cont, YHC.Internal-_id, Prelude-id, Prelude-Prelude.Enum.Prelude.Char.fromEnum, Prelude-Prelude.Enum.Prelude.Char.toEnum, Prelude-Prelude.Integral.Prelude.Integer.toInteger, Prelude-Prelude.Num.Prelude.Integer.fromInteger, RegExpData-func2, RegExpData-func1}
(plus plenty of other equal functions)

So it works, but its just a bit too verbose, and your program must type, whereas the bytecode doesn't. Its also really slow, but I'm sure that could be fixed with a bit of effort.

One thing I did find is that a massive amount of the standard libraries are the same. Some statistics about the functions in the prelude:

1826 functions
701 distinct bytecode sequences (38%)
45025 bytecodes, 27584 of which are required (61%)
1363 distinct constant tables (75%)
4732 constant entires, 4512 of which are required (95%)

Maybe a better bytecode format would store each function as a pointer to a bytecode chunk, and a pointer to a constant table - that way different ones could be mixed and matched. I don't expect the standard compiler to do this (way too expensive, harms debugging), but its useful to be able to do as a standalone optimiser, and nicely cuts away at the memory required.