Haskell Finalizers
==================

  This is intended to be an up-to-date summary of the discussion
  relating to finalizers on the FFI mailing list, up until the end of
  2002.

  The final decision was that, since Haskell finalizers could only be
  implemented in GHC, we would adopt C finalizers.  The problem is
  described in the section titled "Accessing immutable Haskell state"
  below but boils down to atomicity assumptions made by the abstract
  machine.

  The summary is *intended* to be impartial, but since the authors
  (Simon M. and Alastair Reid) have our own differing opinions about
  the issues herein, no doubt we've sometimes favoured one side over
  the other, or at least the exposition of one side of the argument
  may be more accurate than the other.  Please accept our apologies and
  modify accordingly.


The main issue
==============

  The current version of the FFI spec restricts the finalizer routine
  attached to a ForeignPtr to being a C function.  In particular, the
  signatures of the relevant functions are:

    newForeignPtr :: Ptr a -> FunPtr (Ptr a -> IO ()) -> IO (ForeignPtr a)
    addForeignPtrFinalizer :: ForeignPtr a -> FunPtr (Ptr a -> IO ()) -> IO ()

  There is an additional restriction that the C function used as the
  finalizer may not, during its execution, call any function
  implemented in Haskell or hs_perform_gc().  (this is the same
  restriction placed on foreign functions imported with the 'unsafe'
  attribute).

  The primary issue under debate is whether these functions can be
  reasonably generalised to

    newForeignPtr :: Ptr a -> IO () -> IO (ForeignPtr a)
    addForeignPtrFinalizer :: ForeignPtr a -> IO () -> IO ()

  (this is a generalisation because a computation of type IO () can
  implement a computation of type FunPtr (Ptr a -> IO ()) via 'foreign
  import dynamic').

  Note that in the current version of these functions, if the foreign
  finalizers is allowed to call back into Haskell, you can essentially
  implement the second version.  So removing this restriction and
  generalising the finalizers to be of type 'IO ()' are essentially
  the same thing.

  Semantics
  ---------

    Finalizers are guaranteed to run after the ForeignPtr becomes
    unreachable.  We neither define "unreachable" nor specify a bound
    on the length of time between the point at which the ForeignPtr
    becomes unreachable and the time at which the finalizer begins
    executing.

    C finalizers are atomic with respect to other C finalizers and C
    functions called from Haskell.


Rationale
=========

  Why is this a useful generalisation?

  So far we have identified 3 useful categories of functionality that
  would be enabled with Haskell finalizers:

  Composite finalizers
  --------------------

    This is when a finalizer needs to perform multiple operations, and
    composing them in Haskell is simpler than writing a separate C
    routine for the composition.
    
    eg(1). in Text.Regex.Posx, the finalizer for a regular expression
    needs to call both c_regfree and free on the pointer (in that
    order).  This can be handled in three ways: in a Haskell
    finalizer, using addForeignPtrFinalizer to add both finalizers, and
    by writing a separate C routine which calls both functions.
    The two Haskell alternatives are more convenient whilst the third is
    what one would normally do as a C programmer.
    
    eg(2). we might want to free a composite object; say an array of
    strings.  In C we would write the obvious C finalizer:

      void freeStrings(char** ss) {
        for( int i=0; ss[i]; ++i) { free(ss[i]); }
        free(ss);
      }

    whereas in Haskell, we can write the finalizer more concisely as:
    
      do els <- peekArray ptr; mapM free els; free ptr
    
    The tradeoff here is between the naturalness of finalizing C
    objects using C code and the conciseness of Haskell.

    Note that in these examples it is only because we are using a
    ForeignPtr that we have to write the finalizer in C: if we do
    explicit allocation/freeing then it can all be done in Haskell.
    On the other hand, if the object allocator is written in C, it is
    usual to write the finalizer in C as well so it is not clear if
    either example is a burden.

    This issue is rather finely balanced with Haskell finalizers
    giving little benefit over C finalizers.
    

  Taking advantage of closures
  ----------------------------

    The Haskell finalizer can have free variables, whereas the C
    routine only has access to global variables.

    eg. Suppose you have a custom allocator that requires the size of
    the object to be passed to the free routine.  In Haskell, we'd
    write

      p <- my_allocate size
      fp <- newForeignPtr p (my_free p size)

    whereas in C we'd have to allocate a separate structure to contain
    the pointer and its size, essentially programming the closure
    explicitly.

  Communicating with a foreign runtime/garbage collector
  ------------------------------------------------------

    The FFI standard does not currently address the issues of calling
    languages which have their own garbage collector.  When these are
    addressed it seems likely that we will have to provide additional
    'hooks' into the Haskell runtime system to allow the foreign GC to
    communicate with the Haskell GC.

    George Russell claimed to have an example where it is also
    necessary for the foreign GC to invoke Haskell finalizers, but 
    on closer examination it became clear that what is needed is the
    ability to call hs_free_stable_ptr / hs_free_fun_ptr and a richer
    interface to the garbage collector to deal with cycles that span
    both the Haskell and the foreign heap.  In the end, no need for
    Haskell finalizers was demonstrated.


  Expressing liveness dependencies between ForeignPtrs
  ----------------------------------------------------

    The current FFI spec says, under the documentation for touchForeignPtr:

      This function can be used to express liveness dependencies
      between ForeignPtrs: For example, if the finalizer for one
      ForeignPtr touches a second ForeignPtr, then it is ensured that
      the second ForeignPtr will stay alive at least as long as the
      first. This can be useful when you want to manipulate interior
      pointers to a foreign structure: You can use touchForeignObj to
      express the requirement that the exterior pointer must not be
      finalized until the interior pointer is no longer referenced.

    If finalizers are restricted to being C-only, then we need an
    alternative mechanism to be able to express these dependencies.

    (side note: we believe this technique doesn't work in GHC at the
    moment, although you can achieve the desired effect using Weak
    pointers).


  Other cleanups
  --------------

   1. MarshalAlloc has to export a FunPtr version of free, so that it
      can be used in a finalizer.

   2. A general point: the philosophy of the FFI is to move as much
      marshalling machinery into Haskell as possible, on the grounds
      that we believe Haskell is likely to be more expressive than the
      foreign language, and hence writing marshalling code in Haskell
      is likely to be easier.  Requiring that finalizers be C
      functions seems to go against the grain here.

      On the other hand, writing the finalizer for a C object in C is
      a very natural way to write code and allows us to take advantage
      of what syntactic convenience and typechecking C affords us.


  Accessing immutable Haskell state
  ---------------------------------

    There are significant implementation problems if the finalizer
    has to evaluate a thunk (unevaluated expression) currently being
    evaluated by the main program. 

    A pure graph reduction machine performs evaluation using a series
    of small, atomic, non-overlapping reduction steps.  Context
    switches between threads may be made between any two reductions.
    Unfortunately, no Haskell compiler implements pure graph reduction.

    Hugs and (I believe) NHC are based on variants of the G-machine.
    An important property of the G-machine is that reduction steps are
    not small, not atomic and may overlap.  Making context switches
    'between' reductions isn't feasible because there are no gaps
    between reductions and making context switches during reductions
    is hard because the heap isn't in a consistent state - some of the
    essential state of the evaluation is in the stack (where it is
    hard to access).

    GHC is based on the STG-machine.  Like the G-machine, the STG
    machine has large, non-atomic, overlapping reductions but two
    techniques have been developed which allow context switches in
    'the middle' of a reduction step.

    1) When a thunk is being evaluated, the thunk is overwritten with
       a 'blocking queue'.  If a thread tries to evaluate a blocking
       queue, the thread is blocked (put to sleep, removed from the
       running queue) and added to the queue to be restarted when the
       thread reawakens.  (Deadlock may result in some circumstances -
       this is handled by a generic deadlock-detection mechanism.)

       A worthwhile variant of this is to delay overwriting the thunks
       until a context switch happens.

       Obviously, the use of blocking queues relies on being able to
       block a thread.

    2) The STG machine also provides a way to abandon evaluation with
       the possibility of later choosing to evaluate it again using
       resumable black holes.

          A. Reid, 
          Putting the Spine back in the Spineless Tagless G-machine: 
           An Implementation of Resumable Blackholes.
          Proceedings of Implementation of Functional Languages (IFL98), 
          Lecture Notes in Computer Science, volume 1595, pp 189-202, 
          Springer Verlag, 1999
          http://www.reid-consulting-uk.ltd.uk/alastair/publications/ifl98.ps.gz

       This mechanism allows the thread state on the stack to be
       transferred back onto the heap.  That is, it is another
       mechanism with which threads may be blocked.  Thus, on
       encountering a thunk which is already being evaluated, we could
       abandon evaluation of this thread and restart it later.  A
       difficult question is how we would know when it was safe to try
       again.

       Implicit in the title of the paper is the idea that this
       technique is only necessary in the _Spineless_ Tagless
       G-machine but it is not actually known whether a similar
       technique is needed in the G-machine as well.

    Both techniques rely on an important detail of GHC's
    implementation: the evaluation stack is explcitly managed by the
    Haskell compiler.  This means that the Haskell compiler is free to
    change and/or exploit the layout to make blocking threads or
    construction of resumable black holes possible.

    In contrast, Hugs' implementation is such that the evaluation
    stack is managed by the C compiler severely limiting our ability
    to implement blocking.  (NHC's implementation is currently
    unknown.)  One possibility is to use setjmp/longjmp to discard
    segments of the C stack (as is used to implement exception
    handlers) to abandon evaluation until a later date.    


  Manipulating mutable Haskell state
  ----------------------------------

    There are significant problems when one wants to manipulate
    mutable Haskell state from a Haskell finalizer.  This is discussed in the
    section on mutable state, below.


  Concurrency
  -----------

    If the implementation supports preemptive concurrency, then the
    problems with mutable state are easily solved.  With concurrency
    *and* mutable state, there are plenty of good reasons for wanting
    Haskell finalizers.
    
    Dissenting note: no example of this has been presented.  (An
    alleged example turned out to be an example of adding a Haskell
    finalizer to a Haskell object.  The Haskell object contained a
    buffer which was to be flushed and deallocated when the object was
    finalized.  The buffer happened to be a C object but clearly the
    buffer needed to be flushed no matter what.)


Arguments against
=================

    Apart from awkward interactions with mutable state and possible
    implementation problems (both discussed below), the reasons for
    wanting C finalizers are:

    1. Experience.  Hugs has had C finalizers for ever, we know they
       are sufficient and can be implemented readily.  

       (NHC also has C finalizers and GHC had C finalizers for about 5
       years before switching to Haskell finalizers.)

    2. Adding Haskell finalizers drags in many concurrency issues for
       programmers:
       
       - potential to introduce race conditions
       - potential for deadlock
       
       As with most concurrency issues, the resulting bugs are easy to
       introduce, hard to reproduce and hard to track down.
       
    3. Adding Haskell finalizers has a high implementation burden for
       those who have not already implemented preemptive concurrency -
       greatly increasing the complexity of implementing Haskell +
       FFI.

       It is apparently straightforward to simply delay execution of
       finalizers until they reach 'safe' points in the IO monad but, as
       discussed below, this runs the risk of starvation.
       
    4. Writing Haskell finalizers which benefit from being written in
       Haskell instead of C (i.e., getting some advantage from the
       more complex implementation) requires the addition of mutable
       Haskell state which can be manipulated atomically by the main
       program without fear of being preempted by finalizers.

       These must be specified in the FFI spec and, of course, implemented.

    5. The complexity of the solution seems to be out of line with the
       simplicity of the task to be accomplished.


Implementations
===============

  GHC
  ---

    Currently implements Haskell finalizers.  Finalizers are run in a
    separate thread from other Haskell threads (although several
    finalizers may be run in the same thread) using GHC's preemptive
    concurrency.

  Hugs
  ----

    A prototype patch implementing Haskell finalizers has been
    submitted.  The technique used is to delay the execution of
    finalizers until a "safe" point in the runtime system is reached,
    and then recursively invoke the evaluator to run the finalizer.
    The effect is that the finalizer can "preempt" any running Haskell
    computation, and finalizers can even preempt each other.

    The patch does not address any of the issues described in the
    section on mutable state, below and it is believed that deadlock
    could result if the finalizers used MVars.  More seriously, the
    patch does not deal with the problems discussed in the section on
    immutable state, above.

    An alternative implementation would be to piggyback on Hugs'
    cooperative concurrency and place finalizers into the "thread
    pool" (or whatever Hugs uses for this).  Unfortunately, this runs
    the risk of starvation since context switches only occur when
    certain IO operations are run: programs that don't call those IO
    operations will fail to execute finalizers.

  NHC
  ---

    Malcolm Wallace claims that Haskell finalizers can also be
    implemented in NHC.  As far as we know, there are no further
    issues here (except those relevant to both Hugs & NHC, which are
    discussed below).

    [Closer examination suggests that NHC probably suffers from the
    problems discussed in the section on immutable state, above.]

        
Further Implications
====================

  Implementing Haskell finalizers is not the end of the story.  Since,
  using the implementation techniques described above, a Haskell
  finalizer can pre-empt the currently running Haskell computation,
  other issues arise...

Mutable State in Haskell Finalizers
===================================

    IORefs are not a standard part of Haskell and are not required by
    the FFI specification.  Nevertheless, there are three reasons why
    we should consider them when designing the FFI.

    1) They are widely implemented and heavily used.

    2) If absent, they can be implemented (perhaps in an ad hoc way)
       using mutable C state and StablePtrs.

    3) Since the purpose of finalizers is to cleanup (mutable) state,
       one of the strongest motivations for writing a finalizer in 
       Haskell instead of C is that the finalizer's job is to cleanup
       mutable Haskell state involving the C object.

    We want to be able to use mutable Haskell state from a Haskell
    finalizer, but we clearly can't use IORefs.  Finalizers which
    modify IORefs will always contain race conditions.

    The usual solution to this problem is to use appropriate
    synchronisation primitives, but we don't necessarily have
    concurrency available.

    Indeed, attempting to use synchronization (i.e. MVars) from a
    finalizer in Hugs using the implementation outlined above will
    lead to deadlock, because the finalizer is run outside of the
    scheduling mechanism used to implement cooperative concurrency in
    Hugs.

    Alastair Reid also argues that this restriction will lead to
    non-portable code: anyone that requires synchronization will use
    MVars, the code will work in GHC, will deadlock in Hugs, and won't
    work at all in NHC (no MVars).

    Solution 1 - safe points
    ------------------------

      To avoid this problem, execution of finalizers could be restricted
      to 'safe' points in the IO monad.  For example, the programmer
      could explicitly call a function 'runFinalizers' or we could agree
      the certain IO operations constitute 'safe' points.  This runs the
      risk of starvation: the finalizer may never be run if the program
      spends all its time executing pure (non-IO) code or fails to call
      runFinalizers often enough.  It is also rather vulnerable to
      differences between garbage collectors in different systems: at
      the time that a call to runFinalizers is made, a compiler with a
      simple 2-space collector might have identified a ForeignPtr as
      garbage and the finalizer will be run whilst a compiler with a
      sophisticated generational collector might not yet have identified
      the ForeignPtr as garbage and 'miss the boat'.  This may lead to
      starvation since the 'boat' may never come back.

    Solution 2 - we just don't support it
    -------------------------------------

      One solution is to simply say that using mutable state from a
      finalizer is not supported.  Since mutable state (i.e. IORef or
      MVar) isn't part of the Haskell 98 standard or the FFI, it would
      appear that this conveniently sidesteps the issue.  There are two
      problems with trying to define away the problem in this way:
      
      Most implementations use mutable state internally in their
      libraries, so this might entail some implementation effort to make
      these libraries finalizer-safe.  Without any synchronization
      primitives, making the libraries safe might be tricky or
      impossible (although you could back off and simply say that
      certain library calls cannot be invoked from a finalizer).  Again,
      this leads to portability problems and is bound to result in
      subtle race conditions - a highly undesirable situation!
      
      We don't have any idea about how widespread this problem is
      likely to be, without someone looking through the libraries.
      SimonM had a cursory look through Hugs's IO library (actually
      before I submitted the patch to implement Haskell finalizers)
      and didn't see any problems.  GHC's IO library is affected; but
      then it requires various other extensions, including Weak
      references (which provide Haskell finalizers for Haskell
      objects), anyhow.
      
    Solution 3 - find an appropriate synchronisation mechanism
    ----------------------------------------------------------

      Another solution is to find an appropriate synchronization
      mechanism.  A problem is to find something which doesn't impose an
      excessive burden to implement (e.g., Hugs and NHC developers
      consider adding the mechanisms of preemptive concurrency (multiple
      stacks, etc.) to be too much) but which will work well in code
      which already supports preemptive concurrency.  Discussion didn't
      go very far.


  Issues with C concurrency
  -------------------------

    Interactions with a multithreaded C program seem highly relevant
    but turn out to be a red herring.  The goal of the FFI is to
    enable Haskell to communicate with foreign languages.  Even if the
    language supports threads, this does not require that the Haskell
    runtime should be multithreaded.  It only requires that a lock is
    used to ensure that only one foreign thread can call into Haskell
    at a time.  

    Since the foreign language may choose to implement its own
    scheduler and provide its own locks, it is not appropriate for the
    Haskell runtime to choose a locking mechanism such as pthread
    locks.  Rather, the choice of lock should be made based on the set
    of foreign languages which interact with Haskell.

  
Interaction with co-operative concurrency
=========================================

    Earlier we mentioned that it may be possible to implement Haskell
    finalizers using Hugs's co-operative concurrency.  Forgetting for
    the moment the fact that NHC doesn't support this, what other
    issues does this raise?

    The main problem is one of starvation.  In a co-operative system,
    it would be possible to delay the finalizer indefinitely, without
    doing anything particularly strange.  Indeed, the programmer would
    have to arrange to call yield at regular intervals and arrange
    that pure computations always drop back to the IO monad
    occasionally, in order to give finalizers a chance to run.

    One might argue that since you have to do this anyway in Hugs -
    another thread can only get control at an explicit yield point -
    doing it for finalizers too isn't so bad.  However, this is a red
    herring since cooperative concurrency is primarily used as a
    structuring mechanism.  For example, the HGL uses a
    producer-consumer communication pattern to communicate between the
    event-driven window system (X11 or Win32) and the procedural
    program.

    Another red herring is the argument that, we don't guarantee the
    promptness of finalizers (the Java community tends to the view
    that finalizers are for last-resort cleanup only, not to be relied
    on for normal system operation).  This is a red herring because
    the issue is not one of 'promptness' but of 'starvation'.  We are
    not concerned that execution of a finalizer might be delayed half
    a second or 10 seconds or some other finite time; we are concerned
    that the execution of the finalizer might be delayed indefinitely.
    [It is also open to question whether lessons learnt with imperative
    languages like Java apply to lazy languages.  For example, only
    lazy languages have hGetContents and similar functions built using
    unsafeInterleaveIO.]

    Simon M. tends to the view that implementing Haskell finalizers
    using co-operative concurrency isn't a particularly fruitful
    direction to explore, unless we can't do it any other way.

    Alastair Reid is of the opinion that the risk of starvation makes
    this approach unusable.

  
Proposals
=========

1. No change to the spec for the time being.

   C finalizers are subject to the same restrictions as unsafe foreign
   imports.

   Meanwhile, we can evaluate the need for and cost of implementing
   Haskell finalizers in a range of implementations.  Note that
   upgrading from C finalizers to Haskell finalizers would only
   require that we say that finalizers are invoked with the equivalent
   of a 'safe' ffi call instead of an 'unsafe' ffi call and, no doubt,
   the addition of some convenience functions.

2. No change to the spec for the time being, but we all agree to
   implement an experimental extension to support Haskell finalizers
   and experiment with it.

3. Change the FFI spec to use Haskell finalizers.

   Haskell finalizers would be restricted to not access mutable state
   involving Haskell objects.

   [The awkward construction 'mutable state involving Haskell objects'
   is intended to cover both IORefs and any ad-hoc reinvention of
   IORefs using mutable C state and StablePtrs.  Rephrasings welcome.]

4. Support both Haskell and C finalizers.



--
Simon Marlow, Alastair Reid, Simon Peyton Jones, Malcolm Wallace, Ross
Paterson, Manuel Chakravarty, George Russell, John Meacham, and others
on the ffi@haskell.org mailing list.  (please add or remove names as
appropriate).
