Forth UM/MOD for 6502 and 65C816/802 |
F.I.G. Implementation FIG65ROM.AS
Wrong implementation: Bit 17 isn't handled after ROL 5,X (2nd line after L433).
Even with zero-page indexed addressing mode accessing arguments it is faster than transfering it to a zero-page scratch area (N area) which would take more overall time.
; U/ ; SCREEN 24 LINE 1 ; L418 .BYTE $82,'U',$AF .WORD L386 ; link to U* USLAS .WORD *+2 LDA 4,X LDY 2,X STY 4,X ASL A STA 2,X LDA 5,X LDY 3,X STY 5,X ROL A STA 3,X LDA #16 STA N L433 ROL 4,X ROL 5,X SEC LDA 4,X SBC 0,X TAY LDA 5,X SBC 1,X BCC L444 STY 4,X STA 5,X L444 ROL 2,X ROL 3,X DEC N BNE L433 JMP POP
V-Forth 3.7 fix and fix in official release 3.8.
V-Forth is a FIG Forth derivate for the VIC-20 from Simon J. Rowe <srowe mose org uk>
Source: Git-Repo UM/MOD
Handles 17th bit in separate case, already very efficient, but release 3.7 had a false assumption, that the carry flag is always set, but is not in all cases.
Even with zero-page indexed addressing mode accessing arguments it is faster than transfering it to a zero-page scratch area (N area) which would take more overall time.
CODE UM/MOD ( 31 BIT DIVIDEND-2, -3, 16 BIT DIVISOR-1 *) ( 16 BIT REMAINDER-2, 16 BIT QUOTIENT-1 *) SEC 2 + LDA, SEC LDY, SEC 2 + STY, .A ASL, SEC STA, SEC 3 + LDA, SEC 1+ LDY, SEC 3 + STY, .A ROL, SEC 1+ STA, 10 # LDA, N STA, BEGIN, SEC 2 + ROL, SEC 3 + ROL, SEC 2+ LDA, CS NOT IF, SEC, BOT SBC, TAY, SEC 3 + LDA, BOT 1+ SBC, CS IF, SEC 2+ STY, BEGIN, SEC 3 + STA, SWAP THEN, SEC ROL, SEC 1+ ROL, N DEC, ROT 0= UNTIL, POP JMP, SWAP THEN, BOT SBC, SEC 2+ STA, SEC 3 + LDA, BOT 1+ SBC, SEC, CS NOT UNTIL, END-CODE
259 CODE UM/MOD ( 31 BIT DIVIDEND-2, -3, 16 BIT DIVISOR-1 *) 260 ( 16 BIT REMAINDER-2, 16 BIT QUOTIENT-1 *) 261 SEC 2 + LDA, SEC LDY, SEC 2 + STY, .A ASL, SEC STA, 262 SEC 3 + LDA, SEC 1+ LDY, SEC 3 + STY, .A ROL, SEC 1+ STA, 263 10 # LDA, N STA, 264 BEGIN, SEC 2 + ROL, SEC 3 + ROL, SEC 2+ LDA, 265 CS NOT IF, 266 SEC, BOT SBC, TAY, 267 SEC 3 + LDA, BOT 1+ SBC, 268 CS IF, SEC 2+ STY, LABEL L442 SEC 3 + STA, THEN, 269 SEC ROL, SEC 1+ ROL, 270 N DEC, SWAP 0= 271 UNTIL, POP JMP, 272 THEN, BOT SBC, SEC 2+ STA, SEC 3 + LDA, BOT 1+ SBC, 273 L442 JMP, 274 END-CODECorrection:
272 THEN, BOT SBC, SEC 2+ STA, SEC 3 + LDA, BOT 1+ SBC, SEC,
In case the 17th bit is set (checked in line 265) the subtraction is forced but even with 17th bit is set, the resulting carry flag is not alway set (because the 16 bit from which is subtracted could be less then the divisor). In such case the carry must be set which is ROLed into to quotient.
Even with zero-page indexed addressing mode accessing arguments it is faster than transfering it to a zero-page scratch area (N area) which would take more overall time.
A re-worked version of the orignal below. Saves 2 temporary byte in zero-page, handling the 17th bit in a separate case.
Standard assembly source:; UM/MOD 2018-04-14 J KLASEK AT ) * = $1000 TBITCOUNT = $74 POP = $84 ; In: ; 5,X 4,X 3,X 2,X 1,X 0,X ; +-------------+---------------+--------------+ ; | Dividend L | Dividend H | Divisor | ; +-------------+---------------+--------------+ ; Out: ; 3,X 2,X 1,X 0,X ; +-------------+---------------+ ; | Remainder | Quotient | ; +-------------+---------------+ UMMOD: LDA 4,X ; Swap dividend high and low LDY 2,X STY 4,X ASL ; shift dividend low STA 2,X LDA 5,X LDY 3,X STY 5,X ; dividend low in 4,X/5,X ROL ; shift dividend low STA 3,X ; dividend low in 2,X/3,X shifted left LDA #16 STA TBITCOUNT LOOP: ROL 4,X ; shift in bit from dividend low ROL 5,X BCC + LDA 4,X ; 17th bit set, dividend is greater SBC 0,X ; so just subtract divisor STA 4,X LDA 5,X SBC 1,X ; subtraction might finish with carry cleared! SEC ; always set carry for quotient BCS ++ ; save A later ... + SEC ; fits divisor into dividend high? LDA 4,X SBC 0,X TAY ; subtraction result temporarily put into Y/A LDA 5,X SBC 1,X BCC +++ ; does not fit STY 4,X ; keep subtraction result ++ STA 5,X ; entry from 17th bit case +++ ROL 2,X ; take quotient bit from carry ROL 3,X ; and shift out dividend low DEC TBITCOUNT ; loop over 16 dividend bits BNE LOOP JMP POPForth assembly source:
( U/ FIXED-V2 2018-04-14 J KLASEK AT ) ASSEMBLER HEX 74 CONSTANT TBITCOUNT CODE UM/MOD 4 ,X LDA, 2 ,X LDY, 4 ,X STY, .A ASL, 2 ,X STA, ( SWAP LO/HI ) 5 ,X LDA, 3 ,X LDY, 5 ,X STY, .A ROL, 3 ,X STA, ( AND SHIFT ) 10 # LDA, TBITCOUNT STA, BEGIN, 4 ,X ROL, 5 ,X ROL, CS IF, 4 ,X LDA, 0 ,X SBC, 4 ,X STA, 5 ,X LDA, 1 ,X SBC, SEC, HERE 1+ DUP BCS, ROT ROT ( + LABEL) THEN, SEC, 4 ,X LDA, 0 ,X SBC, TAY, 5 ,X LDA, 1 ,X SBC, >= IF, ( C = 1 ) 4 ,X STY, ( + ) ROT DUP HERE SWAP - 1- SWAP C! 5 ,X STA, THEN, 2 ,X ROL, 3 ,X ROL, TBITCOUNT DEC, 0= UNTIL, POP JMP,
This version isn't wrong but just inefficient by using a 3rd byte just to cover the 17th bit handling. This results in rotating and subtracting 24 bits instead of 16 bits ...
( U/ ORIGINAL 2018-04-13 ) ASSEMBLER HEX 74 CONSTANT TBITCOUNT 75 CONSTANT TMSB 76 CONSTANT TMP CODE U/ORIG TMSB STY, ( ZERO ) 4 ,X LDA, 2 ,X LDY, 4 ,X STY, .A ASL, 2 ,X STA, ( SWAP LO/HI) 5 ,X LDA, 3 ,X LDY, 5 ,X STY, .A ROL, 3 ,X STA, ( AND SHIFT ) 10 # LDA, TBITCOUNT STA, BEGIN, 4 ,X ROL, 5 ,X ROL, TMSB ROL, SEC, 4 ,X LDA, 0 ,X SBC, TAY, 5 ,X LDA, 1 ,X SBC, TMP STA, TMSB LDA, 0 # SBC, TMP LDA, >= IF, 4 ,X STY, 5 ,X STA, THEN, 2 ,X ROL, 3 ,X ROL, TBITCOUNT DEC, 0= UNTIL, POP JMP,
Source: Volks-forth CBM core on Github
This version is more more space-efficient because the separate case for a set bit 17 is munged together. The trick is that the 17th bit is kept in location N 6 +. Just in case the divisor does not fit into the dividend high the 17th bit is considered and if set, the subtraction result is taken either.
Because the are 17 rounds the dividend high, containing the remainder has to be shifted back by one bit. If the bit shifted out is set this indicates an overflow condition.
The parameter stack implemention (zero-page indirect indexed) here needs to store the values into the zero-page working area (N) because accessing the stack elements is not easily possible (in comparison to zero-page indexed).
Code um/mod ( ud u -- urem uquot) SP X) lda N 5 + sta ( take parameters from stack ) SP )Y lda N 4 + sta SP 2inc ( divisor ) SP X) lda N 1+ sta SP )Y lda N sta iny ( dividend high word ) SP )Y lda N 3 + sta iny SP )Y lda N 2+ sta $11 # ldx clc ( dividend low word ) [[ N 6 + ror sec N 1+ lda N 5 + sbc ( save 17th bit ) tay N lda N 4 + sbc ( uubtract divisor from dividend high, result in Y/A ) CS not ?[ N 6 + rol ]? ( if divisor does not fit, is 17th bit set? ) CS ?[ N sta N 1+ sty ]? ( store new dividend high ) N 3 + rol N 2+ rol N 1+ rol N rol ( shift dividend low into dividend high ) dex 0= ?] ( all rounds finished? ) 1 # ldy N ror N 1+ ror ( fix remainder, if 17th bit set, there is ) CS ?[ ;c: divovl ; Assembler ]? ( an overflow! ) N 2+ lda SP )Y sta iny ( store back on stack ) N 1+ lda SP )Y sta iny N lda SP )Y sta 1 # ldy N 3 + lda Puta jmp end-code
Source: Tali Forth on Github
For 65C02 (using STZ, BRA).
Also an implementation with an extra byte to keep the 17th bit.
Even with zero-page indexed addressing mode accessing arguments it is faster than transfering it to a zero-page scratch area (N area) which would take more overall time.
Similar to the implemention presented by Garth Wilson, Corrected Forth UM/MOD primitive.
; ---------------------------------------------------------------------------- ; UM/MOD ( ud u1 -- u2 u3 ) ("UM/MOD") 32/16 -> 16 ; Divide double cell number by single number and return the quotient u3 as ; the TOS and remainder as NOS. All numbers are unsigned. This is the basic ; division operation the others use. Based on FIG Forth code, modified by ; Garth Wilson, see http://6502.org/source/integers/ummodfix/ummodfix.htm l_ummod: bra a_ummod .byte $06 .word l_udmod ; link to UDMOD .word z_ummod .byte "UM/MOD" .scope a_ummod: ; TODO see if we have enough stuff on the stack ; prevent division by zero. We currently do not check for ; overflow lda 1,x ora 2,x bne _notzero lda #$09 ; "Division by zero" string jmp error _notzero: ; we loop 17 times lda #$11 sta TMPCNT _loop: ; rotate low cell of dividend one bit left (LSB) rol 5,x rol 6,x ; loop control dec TMPCNT beq _done ; rotate hi cell of dividend one bit left (MSB) rol 3,x rol 4,x stz TMPADR ; store the bit we got from hi cell MSB rol TMPADR ; subtract dividend hi cell minus divisor sec lda 3,x sbc 1,x sta TMPADR+1 lda 4,x sbc 2,x tay ; use Y as temporary storage lda TMPADR ; include bit carried sbc #$00 bcc _loop ; make result new dividend hi cell lda TMPADR+1 sta 3,x sty 4,x ; used as temp storage bra _loop _done: ; drop on from the data stack, swap quotient and remainder inx inx jsr l_swap z_ummod: rts
Even with zero-page indexed addressing mode accessing arguments it is faster than transfering it to a zero-page scratch area (N area) which would take more overall time.
Notes:#HEADER "UM/MOD", NOT_IMMEDIATE ; ( ud u -- rem quot ) UMsMOD #CODE ; +-----------+------------------------+ ; | DIVISOR | D I V I D E N D | ; | | hi cell lo cell | ; +-----------+-----------+------------+ ; | 0,X 1,X | 2,X 3,X | 4,X 5,X | ; +-----------+-----------+------------+ ; Detect overflow or /0 condition: ; Subtract divisor from hi cell of dividend LDA 2,X ; 2/5 C flag set means, divisor was not big enough CMP 0,X ; 2/5 to avoid overflow. BCS uoflo ; 2/2|3 This also takes care of any /0 conditions. STY IP ; 2/4 Save IP because Y is needed a loop counter. LDY #16 ; 3/3 We will loop 16x; but since we shift the ; dividend over at the same time as shifting ; the answer in, the operation must start AND ; finish with a shift of the lo cell of the ; dividend (which ends up holding the quotient). ; The hi cell of dividend will be kept in accum. ASL 4,X ; 2/8 Initial lo cell of dividend shift. ; 13/27 ; ------ ushft2: ROL ; 1/2 Shift hi cell of dividend left one bit, also ; if 17th bit of dividend is set, divisor always BCS umsb ; 2/2|3 fits. ; Does divisor fit into hi 16 bits of dividend? ; Just compare (keeping A intact). CMP 0,X ; 2/5 If carry got cleared, divisor did not fit. BCC unotin ; 2/2|3 determining if the divisor fit into the hi umsb: ; Carry always set here. ; 16 bit of the dividend. 17th bit is always 0. SBC 0,X ; 2/5|- Divisor fits, so actually do the subtract SEC ; 1/2|- With 17th bit set carry might be zero because ; the low 16 bits could be less than the divisor. unotin: ROL 4,X ; 2/8 Move lo cell of dividend left one bit, also ; shifting answer in. DEY ; 1/2 later gets pushed off the other end in the BNE ushft2 ; 2/3|2 last rotation. ; 15/ (31*quot_bit_1) ; +(25*quot_bit17_bit_1) ; +(23*quot_bit_0) ; -1 ; ------ LDY IP ; 2/4 Restore IP uleave: ; Put quotient and remainder on stack where STA 2,X ; 2/5 dividend used to be, and restore the SP. #INX2 ; 2/4 Leave 2 cells on stack. JMP USswap ; 3/24 Swap quotient and remainder and restore IP ; 9/37 ; ------ ; 44/27 ; +(31*quot_bit_1) ; +(25*quot_bit17_bit_1) ; +(23*quot_bit_0)-1 ; -1 ; +37 ; ====== Normal uoflo: LDA #$FFFF ; 3/3 An overflow or /0 condition occured: STA 4,X ; 2/5 Both the quotient BRA uleave ; 2/3 and reminder to -1! ; 7/61 (13+11+37) ; ====== Overflow (intro+uleave)
Implemention presented by Garth Wilson, Corrected Forth UM/MOD primitive.
Notes:INDEX_16: MACRO ; Put X & Y into 16-bit mode. REP #00010000B ; NOP ; NOP necessary for early versions of ENDM ; '802 and '816 >4MHz. ;------------------- INDEX_8: MACRO ; Put X & Y into 8-bit mode. SEP #00010000B ; NOP ; NOP necessary for early versions of ENDM ; '802 and '816 >4MHz. ;------------------- HEADER "UM/MOD", NOT_IMMEDIATE ; ( ud u -- rem quot ) UMsMOD: CODE ; To save a cycle from every stack load or store, move TXY ; the inputs to N. Save stack pointer to restore later INY ; with one less cell. (At the exit, N+2 below will be INY ; transferred to the top of the stack, and N, N+1 where STY XSAVE ; the divisor is, will be dropped in the bit bucket.) ; +-----|-----+-----|-----+-----|-----+-----|------+ LDY #< N ; | DIVISOR | D I V I D E N D |temp storage| LDA #5 ; | | hi cell lo cell |of carry bit| MVN 0,0 ; | N N+1 | N+2 N+3 | N+4 N+5 | N+6 N+7 | ; +-----|-----+-----|-----+-----|-----+-----|------+ SEC ; Detect overflow or /0 condition. To find out, sub- LDA N+2 ; tract divisor from hi cell of dividend; if C flag SBC N ; remains set, divisor was not big enough to avoid BCS uoflo ; overflow. This also takes care of any /0 conditions. ; If branch not taken, C flag is left clear for 1st ROL. ; We will loop 16x; but since we shift the dividend LDX #<$11 ; over at the same time as shifting the answer in, the ; operation must start AND finish with a shift of the ; lo cell of the dividend (which ends up holding the INDEX_16 ; quotient), so we start with 17 in X. We will use Y ; for temporary storage too, so set index reg.s 16-bit. ushftl: ROL N+4 ; Move lo cell of dividend left one bit, also shifting ; answer in. The 1st rotation brings in a 0, which DEX ; later gets pushed off the other end in the last BEQ umend ; rotation. Branch to the end if finished. ROL N+2 ; Shift hi cell of dividend left one bit, also shifting LDA #0 ; next bit in from high bit of lo cell. ROL A STA N+6 ; Store old hi bit of dividend in N+6. SEC ; See if divisor will fit into hi 17 bits of dividend LDA N+2 ; by subtracting and then looking at carry flag. SBC N ; If carry got cleared, divisor did not fit. TAY ; Save the difference in Y until we know if we need it. LDA N+6 ; Bit 0 of N+6 serves as 17th bit. SBC #0 ; Complete the subtraction by doing the 17th bit before BCC ushftl ; determining if the divisor fit into the hi 17 bits of ; the dividend. If so, the C flag remains set. STY N+2 ; If divisor fit into hi 17 bits, update dividend hi BRA ushftl ; cell to what it would be after subtraction. Branch. uoflo: LDA #$FFFF ; If an overflow or /0 condition occurs, put FFFF in STA N+4 ; both the quotient STA N+2 ; and the remainder. umend: INDEX_8 LDX XSAVE LDA N+2 ; Put quotient and remainder on stack where dividend STA 2,X ; used to be, and restore the stack pointer. Remember LDA N+4 ; that we incremented the stack pointer above with TXY, STA 0,X ; INY, INY, STY XSAVE to knock one cell off the stack. JMP NEXT ; (Doing TXY first was to keep X's value for the MVN.) ;-------------------
zurück zur Startseite |