HsNub - update
I just finished HsNub, and while it works basically, its not all that I was hoping for. The basic problem is that lots of functions which are entirely different have the same bytecode! For example, selecting the first element out of a data structure is given as lots of different names, often field selectors. To be useful, it would need to incorporate some knowledge of the types as well.
Just to show an example of it running, when trying it on my regular expression library, tweaked to add two functions func1 and func2 that are just like id, I get the result:
{Data.Char-cont, YHC.Internal-_id, Prelude-id, Prelude-Prelude.Enum.Prelude.Char.fromEnum, Prelude-Prelude.Enum.Prelude.Char.toEnum, Prelude-Prelude.Integral.Prelude.Integer.toInteger, Prelude-Prelude.Num.Prelude.Integer.fromInteger, RegExpData-func2, RegExpData-func1}
(plus plenty of other equal functions)
So it works, but its just a bit too verbose, and your program must type, whereas the bytecode doesn't. Its also really slow, but I'm sure that could be fixed with a bit of effort.
One thing I did find is that a massive amount of the standard libraries are the same. Some statistics about the functions in the prelude:
1826 functions
701 distinct bytecode sequences (38%)
45025 bytecodes, 27584 of which are required (61%)
1363 distinct constant tables (75%)
4732 constant entires, 4512 of which are required (95%)
Maybe a better bytecode format would store each function as a pointer to a bytecode chunk, and a pointer to a constant table - that way different ones could be mixed and matched. I don't expect the standard compiler to do this (way too expensive, harms debugging), but its useful to be able to do as a standalone optimiser, and nicely cuts away at the memory required.
Just to show an example of it running, when trying it on my regular expression library, tweaked to add two functions func1 and func2 that are just like id, I get the result:
{Data.Char-cont, YHC.Internal-_id, Prelude-id, Prelude-Prelude.Enum.Prelude.Char.fromEnum, Prelude-Prelude.Enum.Prelude.Char.toEnum, Prelude-Prelude.Integral.Prelude.Integer.toInteger, Prelude-Prelude.Num.Prelude.Integer.fromInteger, RegExpData-func2, RegExpData-func1}
(plus plenty of other equal functions)
So it works, but its just a bit too verbose, and your program must type, whereas the bytecode doesn't. Its also really slow, but I'm sure that could be fixed with a bit of effort.
One thing I did find is that a massive amount of the standard libraries are the same. Some statistics about the functions in the prelude:
1826 functions
701 distinct bytecode sequences (38%)
45025 bytecodes, 27584 of which are required (61%)
1363 distinct constant tables (75%)
4732 constant entires, 4512 of which are required (95%)
Maybe a better bytecode format would store each function as a pointer to a bytecode chunk, and a pointer to a constant table - that way different ones could be mixed and matched. I don't expect the standard compiler to do this (way too expensive, harms debugging), but its useful to be able to do as a standalone optimiser, and nicely cuts away at the memory required.
1 Comments:
I don't think the synonym is a good idea - if your function does many things including head as a sub computation, then it won't be match. If all it does is head, then you do want it to match. I don't think these things are detected currently.
I also disagree on the minimum cutoff size, const is one of the smallest functions, but definately doesn't want to be duplicated (as it was with Yhc, named second) since it makes insertWith second less obviously insertWith const, which is known to be insert.
Obviously, before I can be sure, I'll need a full implementation that has the typechecking bit in it :)
Post a Comment
<< Home