Forth UM/MOD for 6502

Up

2018 - 2021-02-11 J. E. Klasek <j+forth klasek at>

Contents:
  1. FIG Forth
  2. V-Forth
  3. C64 Micro Prod. Forth
    1. Standard assembler JK version
    2. Forth assembler JK version
    3. Forth assembler original version
  4. VolksForth
  5. Tali Forth

FIG Forth

F.I.G. Implementation FIG65ROM.AS

Wrong implementation: Bit 17 isn't handled after ROL 5,X (2nd line after L433).
The execution speed probably suffers by using zero-page indexed addressing mode in the loop.

;                                       U/
;                                       SCREEN 24 LINE 1
;
L418      .BYTE $82,'U',$AF
          .WORD L386     ; link to U*
USLAS     .WORD *+2
          LDA 4,X
          LDY 2,X
          STY 4,X
          ASL A
          STA 2,X
          LDA 5,X
          LDY 3,X
          STY 5,X
          ROL A
          STA 3,X
          LDA #16
          STA N
L433      ROL 4,X
          ROL 5,X
          SEC
          LDA 4,X
          SBC 0,X
          TAY
          LDA 5,X
          SBC 1,X
          BCC L444
          STY 4,X
          STA 5,X
L444      ROL 2,X
          ROL 3,X
          DEC N
          BNE L433
          JMP POP

V-Forth

V-Forth 3.7 fix and fix in official release 3.8.
V-Forth is a FIG Forth derivate for the VIC-20 from Simon J. Rowe <srowe mose org uk>
Source:
Git-Repo UM/MOD

Handles 17th bit in separate case, already very efficient, but release 3.7 had a false assumption, that the carry flag is always set, but is not in all cases.
The execution speed probably suffers by using zero-page indexed addressing mode in the loop.

V-FORTH 3.8

CODE  UM/MOD     ( 31 BIT DIVIDEND-2, -3,  16 BIT DIVISOR-1  *)
                 ( 16 BIT REMAINDER-2,  16 BIT QUOTIENT-1    *)
  SEC 2 + LDA,  SEC     LDY,  SEC 2 + STY,  .A ASL,  SEC    STA,
  SEC 3 + LDA,  SEC 1+  LDY,  SEC 3 + STY,  .A ROL,  SEC 1+ STA,
  10 # LDA,  N STA,
  BEGIN,  SEC 2 + ROL,  SEC 3 + ROL,  SEC 2+ LDA,
          CS  NOT  IF,
             SEC,  BOT SBC,  TAY,
             SEC 3 + LDA,  BOT 1+ SBC,
             CS IF,  SEC 2+ STY,  BEGIN,  SEC 3 + STA,  SWAP  THEN,
          SEC ROL,  SEC 1+ ROL,
          N DEC,  ROT  0=
      UNTIL,    POP  JMP,
      SWAP  THEN,  BOT SBC,  SEC 2+ STA,  SEC 3 + LDA,  BOT 1+ SBC,  SEC,
      CS  NOT  UNTIL,
END-CODE

Fixed V-FORTH 3.7

 259 CODE  UM/MOD     ( 31 BIT DIVIDEND-2, -3,  16 BIT DIVISOR-1  *)
 260                  ( 16 BIT REMAINDER-2,  16 BIT QUOTIENT-1    *)
 261   SEC 2 + LDA,  SEC     LDY,  SEC 2 + STY,  .A ASL,  SEC    STA,
 262   SEC 3 + LDA,  SEC 1+  LDY,  SEC 3 + STY,  .A ROL,  SEC 1+ STA,
 263   10 # LDA,  N STA,
 264   BEGIN,  SEC 2 + ROL,  SEC 3 + ROL,  SEC 2+ LDA,
 265           CS  NOT  IF,
 266              SEC,  BOT SBC,  TAY,
 267              SEC 3 + LDA,  BOT 1+ SBC,
 268              CS IF,  SEC 2+ STY,  LABEL L442  SEC 3 + STA,  THEN,
 269           SEC ROL,  SEC 1+ ROL,
 270           N DEC,  SWAP  0=
 271       UNTIL,    POP  JMP,
 272       THEN,  BOT SBC,  SEC 2+ STA,  SEC 3 + LDA,  BOT 1+ SBC,
 273       L442 JMP,
 274 END-CODE
Correction:
 272       THEN,  BOT SBC,  SEC 2+ STA,  SEC 3 + LDA,  BOT 1+ SBC, SEC,

In case the 17th bit is set (checked in line 265) the subtraction is forced but even with 17th bit is set, the resulting carry flag is not alway set (because the 16 bit from which is subtracted could be less then the divisor). In such case the carry must be set which is ROLed into to quotient.


C64 Micro Prod. Forth

The execution speed probably suffers by using zero-page indexed addressing mode in the loop.

More efficient implementation

A re-worked version of the orignal below. Saves 2 temporary byte in zero-page, handling the 17th bit in a separate case.

Standard assembly source:
;  UM/MOD 2018-04-14 J KLASEK AT )

	* = $1000

	TBITCOUNT = $74
	POP = $84

; In:
;   5,X   4,X      3,X    2,X      1,X    0,X
; +-------------+---------------+--------------+
; | Dividend L  |  Dividend H   |   Divisor    |
; +-------------+---------------+--------------+
; Out:
;   3,X   2,X      1,X    0,X
; +-------------+---------------+
; |  Remainder  |   Quotient    |
; +-------------+---------------+

UMMOD:
	LDA 4,X		; Swap dividend high and low
	LDY 2,X
	STY 4,X
	ASL		; shift dividend low
	STA 2,X
	LDA 5,X
	LDY 3,X
	STY 5,X		; dividend low in 4,X/5,X
	ROL		; shift dividend low
	STA 3,X		; dividend low in 2,X/3,X shifted left
	LDA #16
	STA TBITCOUNT
LOOP:
	ROL 4,X		; shift in bit from dividend low
	ROL 5,X
	BCC +		

	LDA 4,X		; 17th bit set, dividend is greater
	SBC 0,X		; so just subtract divisor
	STA 4,X
	LDA 5,X
	SBC 1,X		; subtraction might finish with carry cleared!
	SEC		; always set carry for quotient
	BCS ++		; save A later ...

+	SEC		; fits divisor into dividend high?
	LDA 4,X
	SBC 0,X
	TAY		; subtraction result temporarily put into Y/A
	LDA 5,X
	SBC 1,X
	BCC +++		; does not fit

	STY 4,X		; keep subtraction result 
++	STA 5,X		; entry from 17th bit case
+++	ROL 2,X		; take quotient bit from carry
	ROL 3,X		; and shift out dividend low
	DEC TBITCOUNT	; loop over 16 dividend bits
	BNE LOOP
	JMP POP

Forth assembly source:
( U/ FIXED-V2 2018-04-14 J KLASEK AT )
ASSEMBLER HEX
74 CONSTANT TBITCOUNT
CODE UM/MOD
4 ,X LDA, 2 ,X LDY, 4 ,X STY,
.A ASL, 2 ,X STA, ( SWAP LO/HI )
5 ,X LDA, 3 ,X LDY, 5 ,X STY, 
.A ROL, 3 ,X STA, ( AND SHIFT )
10 # LDA, TBITCOUNT STA,
BEGIN,
  4 ,X ROL, 5 ,X ROL, CS IF,
    4 ,X LDA, 0 ,X SBC, 4 ,X STA,
    5 ,X LDA, 1 ,X SBC, SEC,
    HERE 1+ DUP BCS, ROT ROT ( + LABEL)
  THEN,
  SEC, 4 ,X LDA, 0 ,X SBC, TAY,
  5 ,X LDA, 1 ,X SBC, >= IF, ( C = 1 )
     4 ,X STY, ( + )
     ROT DUP HERE SWAP - 1- SWAP C!
     5 ,X STA,
  THEN,
  2 ,X ROL, 3 ,X ROL,
  TBITCOUNT DEC,
0= UNTIL,
POP JMP, 

Original

This version isn't wrong but just inefficient by using a 3rd byte just to cover the 17th bit handling. This results in rotating and subtracting 24 bits instead of 16 bits ...

( U/ ORIGINAL 2018-04-13 )
ASSEMBLER HEX
74 CONSTANT TBITCOUNT
75 CONSTANT TMSB   76 CONSTANT TMP
CODE U/ORIG
TMSB STY, ( ZERO )
4 ,X LDA, 2 ,X LDY, 4 ,X STY,
.A ASL, 2 ,X STA,  ( SWAP LO/HI)
5 ,X LDA, 3 ,X LDY, 5 ,X STY, 
.A ROL, 3 ,X STA,  ( AND SHIFT )
10 # LDA, TBITCOUNT STA,
BEGIN,
  4 ,X ROL, 5 ,X ROL, TMSB ROL,
  SEC, 4 ,X LDA, 0 ,X SBC, TAY,
  5 ,X LDA, 1 ,X SBC, TMP STA,
  TMSB LDA, 0 # SBC, TMP LDA,
  >= IF,
     4 ,X STY, 5 ,X STA,
  THEN,
  2 ,X ROL, 3 ,X ROL,
  TBITCOUNT DEC,
0= UNTIL,
POP JMP, 

VolksForth

Source: Volks-forth CBM core on Github

This version is more more space-efficient because the separate case for a set bit 17 is munged together. The trick is that the 17th bit is kept in location N 6 +. Just in case the divisor does not fit into the dividend high the 17th bit is considered and if set, the subtraction result is taken either.
Because the are 17 rounds the dividend high, containing the remainder has to be shifted back by one bit. If the bit shifted out is set this indicates an overflow condition.
This version speeds up execution by storing the values into the zero-page and using the zero-page addressing mode.

Code um/mod  ( ud u -- urem uquot)
 SP X) lda  N 5 + sta                      ( take parameters from stack )
 SP )Y lda  N 4 + sta   SP 2inc            ( divisor )
 SP X) lda  N 1+  sta
 SP )Y lda  N     sta   iny                ( dividend high word )
 SP )Y lda  N 3 + sta   iny
 SP )Y lda  N 2+  sta   $11 # ldx  clc     ( dividend low word )
  [[ N 6 + ror  sec  N 1+ lda  N 5 + sbc   ( save 17th bit )
     tay  N lda  N 4 + sbc		   ( uubtract divisor from dividend high, result in Y/A )
    CS not ?[  N 6 + rol ]?                ( if divisor does not fit, is 17th bit set? )
      CS ?[ N sta  N 1+ sty ]?             ( store new dividend high )
     N 3 + rol N 2+ rol N 1+ rol N rol     ( shift dividend low into dividend high )
  dex  0= ?]                               ( all rounds finished? )
 1 # ldy  N ror  N 1+ ror                  ( fix remainder, if 17th bit set, there is )
   CS ?[ ;c: divovl ; Assembler ]?         ( an overflow! )
 N 2+  lda SP )Y sta iny                   ( store back on stack )
 N 1+  lda SP )Y sta iny
 N     lda SP )Y sta 1 # ldy
 N 3 + lda
 Puta jmp    end-code


Tali Forth

Source: Tali Forth on Github

For 65C02 (using STZ, BRA).
Also an implementation with an extra byte to keep the 17th bit.
The execution speed probably suffers by using zero-page indexed addressing mode in the loop.
Similar to the implemention presented by Garth Wilson, Corrected Forth UM/MOD primitive.

; ----------------------------------------------------------------------------
; UM/MOD ( ud u1 -- u2 u3 ) ("UM/MOD")  32/16 -> 16
; Divide double cell number by single number and return the quotient u3 as
; the TOS and remainder as NOS. All numbers are unsigned. This is the basic
; division operation the others use. Based on FIG Forth code, modified by
; Garth Wilson, see http://6502.org/source/integers/ummodfix/ummodfix.htm
l_ummod:        bra a_ummod
                .byte $06
                .word l_udmod    ; link to UDMOD
                .word z_ummod
                .byte "UM/MOD"
.scope
a_ummod:        ; TODO see if we have enough stuff on the stack

                ; prevent division by zero. We currently do not check for
                ; overflow
                lda 1,x
                ora 2,x
                bne _notzero

                lda #$09        ; "Division by zero" string
                jmp error

_notzero:       ; we loop 17 times
                lda #$11
                sta TMPCNT

_loop:          ; rotate low cell of dividend one bit left (LSB)
                rol 5,x
                rol 6,x

                ; loop control
                dec TMPCNT
                beq _done

                ; rotate hi cell of dividend one bit left (MSB)
                rol 3,x
                rol 4,x

                stz TMPADR      ; store the bit we got from hi cell MSB
                rol TMPADR

                ; subtract dividend hi cell minus divisor
                sec
                lda 3,x
                sbc 1,x
                sta TMPADR+1
                lda 4,x
                sbc 2,x

                tay             ; use Y as temporary storage
                lda TMPADR      ; include bit carried
                sbc #$00
                bcc _loop

                ; make result new dividend hi cell
                lda TMPADR+1
                sta 3,x
                sty 4,x         ; used as temp storage

                bra _loop

_done:          ; drop on from the data stack, swap quotient and remainder
                inx
                inx
                jsr l_swap

z_ummod:        rts

 


Best viewed with any browser zurück zur Startseite