JK's Back-Link On Demand Garbage Collection
Author: Johann E. Klasek

Inhalt

  1. Technical Aspects
  2. Run-time behavior
  3. Download
  4. Copyright and legal notice
  5. Further plans
  6. References

Technical Aspects

How is BLOD-GC embedded into BASIC?

Patching the BASIC interpreter

There are four places the interpreter needs to be hooked to get BLOD-GC working. The BASIC ROM will be copied into the underlaying RAM and the old GC routine is redirected to the new BLOD-GC code which is locate in the range from $C000 to $CFFF. Three additional patches are needed which makes sure that every released strings on the heap (which is not actively used anymore) are marked as "free". There are three situation where a string input produces a string output and which has to be handled accordingly:

  1. Variable assignment: An assigment may overwrite an already existing string. A string on the heap of the assigned variable will be marked as free.
  2. String concatenation: Both strings of the operands (if they are located on the heap) will be marked as free.
  3. Partial string extraction: The following BASIC functions LEFT$, RIGHT$, MID$ produces a new partially string of an input string which has to be marked as free if it is located on heap.

All other cases with temporarily created strings are immediatly removed from the heap. They have not to be marked as free.

How does the algorithm work?

The main difference of this implementation in regard to the well-known implementations from BASIC 4.0 or later at BASIC 3.5 and BASIC 7.0 is that there is no need to reserve 2 bytes for each string on the string heap which contains the back-link or gap marker. BLOD-GC stores the back-link or gap marker into the string itself (overwritting the tail of the string). For active strings the overwritten data is temporarily stored into the descriptor (the addree is not needed in at this stage). There only one special case, if the string is of length 1 in which case it cannot take the back-link or gap marker ($FF with the gap length). For this situation a new gap marker with value 1 is introduced. This works because a back-link address never is in page $01 (the stack page).
Only active strings contains the back-link in the last two bytes. The so called "back-link" points back to the string descriptor which on the string descriptor stack, in a single variable, or part of an array.

Here is a diagram showing Variables, the String Descriptor Stacks and the String Heap for following command sequence:

B$="Y"
A$="X"
A$="ABC"+"DEF"
PRINT FRE(0)

The collection process is divided into three steps:
  1. Step (stage): All active strings on the string descriptor stack, in variables and arrays gets a back-link - the overwritten bytes are saved in the descriptor. Only in case of strings with length 1 the string is replace by the special gap marker with value 1. The single byte is store in the descriptor (low byte of string address).
    The remaining gap on the heap shuld be already marked as free.
  2. Step (stage): The heap will be compacted by walking through the heap from the highest address downwards. The prerequisite that all free gaps are appropriately marked (see section above) or in case of an active string contains the back-link to the descriptor (built-up in stage 1). All active strings (which do not has the free marker $FF or $01 in the high byte of the back-link) are taken and will be copied to the upper heap area without gaps. Gaps are skipped during the walk-through. A copied stringt is restored (both upper bytes are taken from the descriptor) on the fly and the descriptor gets the string address with the new string location.
  3. Step (stage): The so far missing strings of length 1 are restored. In the whole descriptor walk-through all strings of length 1 are processed and placed on the heap. Actually, only strings which was located on heap are regarded, which has been marked appropriate (the address is $00xx which normally isn't possible). The single byte currently stored in the descriptor is copied on the heap and the string address back to the descriptor.

Run-time behavior

In comparison to the standard BASIC V2 the run-time of BLOD-GC is 2300 times faster (for 9600 strings).
BLOD-GC is on the same level as
Super Garbage Collection and slightly faster than BASIC 7.0 on a C128 or BASIC 3.5 on a Plus/4 and C16, respectively. On a C64 only the also linear implementation GarbColl is faster, but suffers from the fact, that each string as an additional overhead of 2 bytes (which prevents a runable 9600 string test).
The testing programs can be found on D64 image, see section Download

Program GC-DEMO - stress test
[h:mm:s]
GC-STD-TEST 9600 strings
[h:mm:s]
BLOD-GC 1:43 3,1
BASIC V2 26:22 2:04:27,0
SuperGC 1:43 3,1
Garbage64 (NG) 1:46 2,8
GarbColl (NG) 1:38
BASIC 3.5 1:48 3,2
BASIC 7.0 2:11 3,6
BASIC 7.0 FAST 1:03 1,8
BASIC 4.0 CBM 8xxx 1:30
BASIC 4.0 CBM 4xxx 1:26
BASIC 4.0+ CBM-II (2 MHz 6509) 1:10 2,1

Download



Copyright and legal notice

The programs may be used and distributed freely as long the reference to author remains.

Further plans

References

  1. C64-Wiki.com: Garbage Collection
  2. C64-Wiki.de: Garbage Collection (german)
  3. ROM listing of the original routine:
    the automatically added comments are not very useful, rather irritating.
  4. Jim Butterfield article list from Compute!
    1. Compute! Issue 10 - March 1981: Learning About Garbage Collection
    2. Compute! Issue 49 - June 1984: Garbage Collection On Commodore Computers
    3. Compute! Issue 50 - July 1984: Commodore Garbage Collection Part 2
  5. Article from c't magazine 1984/6 S. 72: BASIC intern - Was nicht im Handbuch steht: Teil 3: Strings und 'Garbage collection' (german)
  6. Book "Microsoft-Basic: Konzepte, Algorithmen und Datenstrukturen", Luidger Röckrath, Franzis' Verlag, ISBN: 3772380115 (german)
    6.2.3 Die Stringspace-Reorganisation Garbage collection (GARCOL), S. 123

     


    Best viewed with any browser zurück zur Startseite