Monday, September 8, 2008

End of SoC

It has almost been a month since my last post, and Google Summer of Code has come to an end. Although I have not achieved all my milestones, this project is nevertheless a success. The Lush interpreter is now powered by the new memory manager module and the result of my SoC work is the basis for "Lush 2", the next major Lush release (see https://lush.svn.sourceforge.net/svnroot/lush/).

The garbage collector is faster now than a few weeks ago thanks to using a bitmap to record the state of the small object heap. Not maintaining free lists at all but using the bitmap for finding unused memory in the small object heap might give further improvements, albeit small ones. I will try it out some time, after describing the work in a brief tech report.

Monday, August 11, 2008

Debugging and Tuning

Friday last week I had a major break-through: I saw the shell again after more than four weeks. Debugging was exactly as much fun as everybody says it is. It's particular frustrating if you spend five hours from not having any clue what part of the system might be causing a crash over narrowing it down to a small number of hypotheses to finding a really stupid mistake that you can't believe you are responsible for. Gdb is a close friend now.

For testing the interpreter I was basically using the startup procedure. When the interpreter starts up it reads in and parses a bunch of source files with definitions. So there surely are a couple of more bugs that this "test" did not expose. Before I try running other programs to tease out more bugs, though, I will spend a little more time on performance tuning. Having to wait a minute or two until the interpreter got to a point where it broke while parsing the standard files was an annoyance while debugging last week (and besides, that performance is unacceptable).

Three items are on my performance tuning to-do list for now:
  • Do some profiling and try to speed up the hot spots in the MM code
  • Implement asynchronous collect (so far there's only synchronous)
  • Implement a smarter collection trigger policy

Tuesday, August 5, 2008

On Designing Interfaces

I talked a little bit about ATs in an earlier post. To summarize, ATs serve three different roles in the original Lush interpreter, namely
  1. As generic, unique identifiers for Lisp objects
  2. As carriers of runtime type information
  3. As carriers of information for memory management (reference counts)
In the branch that I am currently working on, the third point is no longer true and ATs serve only the first two purposes. In this posting I want to talk about the first purpose.

There are many type-generic functions in the interpreter that may work with many or all kinds of objects (creating a string representation, writing an object to a file, computing a hash code, etc.). Many other functions work with objects of particulars type only (open a file given a name (string), slicing an array, arithmetic functions, etc.). And a third kind of function are found in the class structure of Lisp objects. They are type specific versions of functions that should work with all objects (dispose, creating a string representation, listeval, etc.).

The natural signature of type-generic functions include an AT argument pointing to the object in question. They may be implemented as dispatchers to type-specific functions of the third kind. Function signatures for the second kind may or may not include an AT.

When you look at the original Lush interpreter implementation you will find that functions of the second and third kind do have generic signatures as well. The ones of the second kind typically do a runtime type-check, the ones of the third do not. I consider this bad design. If a function works with arguments of a particular type only, then the signature should say so and the implementation should not waste (run)time doing type checks. This is one kind of changes that I have been making since I started hacking Lush, and there's still a lot of work to be done on overhauling internal interfaces.

Yesterday I was churning through the dispose functions, which are functions of the third kind. Objects with non-trivial dispose functions need a finalizer to invoke the dispose function. Lush also has a function delete to explicitly destroy a Lisp object, no matter if it is still referenced or not. Delete calls an object's dispose function and then modifies the AT to make it look like a NIL. So a dispose needs to be idempotent since it may be called twice (once by delete and a second time by the finalizer).

Since I was going to look at all dispose functions I decided to change their signatures from generic to type specific along the way. When I got to dispose for OO objects, I hit a snag: This dispose invokes a user-defined destructor (if one has been defined), and at that point, I need to know the object's AT because the destructor is a Lisp function and evaluate has a generic signature. After making dispose type-specific, however, I don't have the object's AT anymore.

That led me to wonder if maybe I was wrong and there is some value in having all signatures generic. Anyway, I fixed it by adding a "back pointer" to the OO object structure (a pointer that points back to the AT referencing the object), which is not nice. Perhaps the dispose interface still isn't quite right?

Thursday, July 31, 2008

Crash course in Debugging

Debugging sucks. After implanting the new memory manager into the interpreter nothing worked at first. And after over a week spent on debugging, it is still not working. But yesterday I had a good run and slayed over half a dozen bugs. Feels good.

This is my crash course in GDB. I used GDB before occasionally, but I was always starting from zero because I use it so infrequently. This time around there are so many bugs to hunt down that I really need to use a lot of GDB's capabilities and I need to use it for a while! That's how you learn a tool really well.

Tuesday, July 8, 2008

Easier tracking of transient objects

My API for tracking transient objects in MM is a little different from Nickle's. Consider the three following functions.
char *a (int n)
{
assert(n>=0);
char *s = mm_allocv(mt_blob, n+1);
s[n] = '\0';
return s;
}

char *b (date_t d)
{
char *s = a(127);
sprintf(s, "Hello world on %s.\n", date2str(d));
return s;
}

char *c (date_t d)
{
MM_ANCHORED;
char *s = a(127);
sprintf(s, "Hello world on %s.\n", date2str(d));
MM_RETURN(s);
}

Functions b and c are almost the same, except that in c the body is what I call an "anchored scope". Neither b nor c call an allocation function directly, but a does.

Now, will mm_allocv in a push the address of the allocated memory onto the transient object stack or not? It depends. If you are not in an anchored scope and call a or b, then it won't. If you call c, it will. In other words, the allocation functions have context-dependent behavior: they push new addresses when executed in an anchored scope and don't push otherwise. Along the same lines, when exiting an anchored scope with MM_RETURN, the returned object will be pushed only if the return is into an(other) anchored scope.

I believe this behavior is more useful than that of the Nickle system. There are lots of little functions in Lush who create a new object. They typically do one or a small number of allocations and then initialize the object. I do not need to fortify all these functions with MM_ANCHORED, it is good enough to do it to the functions that call such constructor functions. In essence, I believe I will need to use MM_ANCHORED and company less often and will find it easier to write correct code when using it.

Wednesday, July 2, 2008

Destructors and Finalizers

As Hans Boehm argues in a recent paper, finalizers should run asynchronously to the client code. I will support this in my memory manager by queuing up finalization-ready objects and leave it to the client to actually run the finalizers through the API mm_run_pending_finalizer.

Lush "OO objects" may have a destructor method. For stack-allocated objects, the destructor method should be invoked when the object goes out of scope. For heap-allocated objects, the destructor method should invoked after the object becomes unreachable. In other words, "OO objects" need a finalizer, and one of its jobs is to call the object's destructor method. This makes clear that, when there are several finalization-ready objects, the order of finalizer invocation is important.

Current Lush has a facility for registering "finalizers" with individual objects. These are used to notify other objects that hold a weak reference to an object. For instance, a hash table may hold weak references to key objects, and events in the event queue hold a weak reference to an event handler object. The "finalizers" registered by the hash table and by the event queue do not look at the object, but only need to know an object's identity to update the hash table, or event queue, respectively. I would call them "notifiers" rather than "finalizers".

To support this mechanism for maintaining weak references I am extending the memory manager API by a new function mm_notify. This function will just mark a managed object. When the object has become unreachable and when it is about to be reclaimed by the memory manager, a notification function is called with the object's address as argument. The notification function is not specific to an individual object or an object's type and it needs to be given to mm_init when the system is initialized.

Monday, June 23, 2008

Would you like some cream with that?

Just a quick update: The memory manager now organizes small objects of same memory type in blocks; a client needs to register a memory type before first allocation; for larger objects the memory type is stored right in front of the object itself (similar to Nickle's scheme). This allows me to just store the addresses instead of pairs of object and memory type addresses. Here is a brief documentation of the current API.

The next steps are (1) extension of the API by macros for tracking transient objects, (2) tearing open and stitching back together the Lush interpeter, (3) debugging, debugging, debugging, (4) implementing asynchronous collect.

Since there are no special functions for dealing with ATs anymore, I am feeling this could become a library...

Tuesday, June 17, 2008

Making ATs tiny

As I almost said in my last post, ATs are used almost everywhere in the interpreter. They are kind of abstract references that carry runtime type information and a pointer to a Lisp object. Small and ubiquitous objects like numbers and conses, though, are stored in an AT itself. In other words, when an AT references a number, then the AT does not contain a pointer to a number object but it contains the number itself, saving the cost of one level of indirection.

Another important role of ATs is that they are the handles to all objects for the current memory management system, which is based on reference counting. An AT contains a reference count to keep track of the number of references from other objects to the object the AT represents. Doing away with reference counting is one of the goals of this projects and so ATs will become smaller because we do not need to store reference counts anymore. On a 32-bit platform the current size of an AT is 16 bytes (4 words), which will drop to 12 bytes (3 words) without the reference count.

Could ATs be made smaller still? If we were pure and would not treat small objects special, then an AT could be just a pair of pointers---one pointer to the data representing an object and one pointer to a structure representing the object's type. An AT would be just 2 words tall (with 2 words, numbers and gptrs could still be represented directly, in fact). Would redefining ATs in this way be a good idea? Or would the level of indirection to be introduced for conses result in a noticeable performance hit?

What's the structure of a cons? A cons is just a pair of pointers (pointers to two other ATs). Note that this is the same structure that a "micro AT", just outlined, would have. This leads me to wonder if it could somehow be possible to represent conses and ATs using the same 2 word structure. An AT representing a cons would just be the cons itself rather than a reference to it. But how, then, could I distinguish ATs being conses from ATs being references to other kinds of objects?

I will give that some thought. But today I will start with simplifying ATs by throwing out flag bits that are currently used for fast runtime type checking but are not really necessary because the type is also encoded by a pointer to the type structure. My plan is to simplify the AT structure as much as possible and then switch to the new memory manager that I have been growing in a new source file but which is not linked in yet.

Wednesday, June 11, 2008

ATs are special

The most commonly used (dynamically allocated) object in the interpreter is the AT structure, by which conses, numbers and pointers are represented. ATs may well make 70% or more of all objects to be handled by the interpreter (my guestimate).

Since they come in such large numbers, I treat ATs special and allocate them in a dedicated list of blocks (Lush currently does this, btw). This helps alleviating the increased resource demand that comes with the new garbage collection scheme. First, I do not need to store address-memtype assocations for AT objects. This saves some space and decreases the n in the O(n) lookup time for all other addresses. The mark bits for ATs are stored in the blocks themselves, and marking an AT can be done in essentially the same way as in Nickle. That is, constant-time marking for most managed objects!

Monday, June 2, 2008

Lessons from Week One

Summer of Code is on for one week now. I have spent it more reading and thinking than coding but am much clearer about the design for Lush's new memory manager. As I said in my project proposal, Nickle's garbage collector is my primary model, but a few extra requirements force me do some things differently for Lush.

The first requirement is that the collector should utilize a second processor. Keith Packard suggested forking a child process to do the collection. The child would do the mark & sweep and send addresses of non-live heap objects back to the parent process via a pipe. The parent then does the actual reclamation (rebuilding of the free lists) in an incremental fashion (e.g., within the interpreter's idle function or as part of alloc). Creating a child process creates a lazy copy of the parent's address space (copy-on-write), and the cost is that of duplicating the parent's page tables and creating the child's task structure.

So far so good. Now Nickle's GC stores the mark bit just next to the memory object itself. This means that during mark and sweep, the GC needs to do memory writes for all live objects, which are scattered all over the heap. This would throw fork's low cost benefit out of the window. I therefore need to store the mark bits somewhere else, and compactly so in memory. Further, the GC needs to find the mark bit for a given object (address) quickly, or the mark phase will take forever. My solution is to store object address--GC data associations in a dense array, sort the array at the beginning of a collection cycle, and do binary search for the lookup.

Nickle's GC scheme allows very fast (constant time) access to the mark bit. My solution requires O(log n) time per lookup in an array with n addresses. It is a little ironic that utilizing a second processor requires doing much more work. It is quite possible that Nickle's scheme, using only one processor, does actually perform better in practice than the two-processor solution (however, I do not feel inclined to do the experiment and implement both schemes).

More on this later.

Friday, May 9, 2008

Because I can

Ralf got a blog, there you go!

Over the next few months you will find notes on my Google Summer of Code project here. Stay tuned.