TABLE OF CONTENTS AMMX/--------------Introduction---------------- AMMX/--Instruction_Words_and_Addressing_Modes-- AMMX/-------Detecting_AMMX_in_AmigaOS---------- AMMX/--------Enabling_AMMX_in_AmigaOS---------- AMMX/BFLYW AMMX/BSEL AMMX/C2P AMMX/LOAD AMMX/LOADi AMMX/PACKUSWB AMMX/PACK3216 AMMX/PADDB AMMX/PADDUSB AMMX/PADDUSW AMMX/PADDW AMMX/PAND_POR_PEOR_PANDN AMMX/PAVGB AMMX/PCMPccB AMMX/PCMPccW AMMX/PMAXxB AMMX/PMAXxW AMMX/PMINxB AMMX/PMINxW AMMX/PMUL88 AMMX/PMULA AMMX/PMULH AMMX/PMULL AMMX/PSUBB AMMX/PSUBUSB AMMX/PSUBUSW AMMX/PSUBW AMMX/STORE AMMX/STOREC AMMX/STOREi AMMX/STOREilm AMMX/STOREm AMMX/TRANSHI AMMX/TRANSLO AMMX/UNPACK1632 AMMX/VPERM AMMX/__MISC__ AMMX/__MISC__ ____ ____ .__ \ \ / /____ _____ ______ |__|______ ____ \ Y /\__ \ / \\____ \| \_ __ \_/ __ \ \ / / __ \| Y Y \ |_> > || | \/\ ___/ \___/ (____ /__|_| / __/|__||__| \___ > \/ \/|__| \/ Conversion from AutoDoc format to AmigaGuide (keeping the backspaces intact): cat AMMX_doc.txt | sed 's/\\/\\\\/g' >ram:AMMX_bspace.doc ad2ag ram:AMMX_bspace.doc Undocumented Instructions (for now): ------------------------------ BANK - register bank switching ------------------------------ BANK SrcA,SrcB,Size ; BANK SrcA,SrcB,Dest,Size ; ; "Size" is the length of the whole bundle = opcode length + bank_length (2) ; Size = %00 : 4 bytes ; Size = %01 : 6 bytes ; Size = %10 : 8 bytes ; Size = %11 : 10 bytes ; currently valid banks: ; 2 Address register banks (An,Bn) ; 4 Data register banks (D0-D7,E0-E7,E8-E15,E16-E23) BANK MACRO ; ----CCC-DDCCAABB AA BB DD dc.w (%0111000100000000+((\1)*%100)+(\2)+((\3)*%1000000)) ENDM short examples: BANK 0,0,%10 ; dc.w $7180 ; examplary/redundant: ; bank 0 is default lea NUMBERS,a5 ; Load in A5 BANK 0,1,%10 ; dc.w $7181 lea NUMBERS,a5 ; Load in B5 moveq #0,d0 moveq #0,d1 BANK 0,0,%00 ; dc.w $7100 ; select register bank 0 ; (redundant, example only) add.l (a5)+,d0 ; BANK 1,0,%00 ; select Address register Bank 1 add.l (a5)+,d1 ; add.l (b5)+.d1 ; D0 should be = D1 = 1 rts numbers: dc.l 1,2 -------- MINITERM -------- Full documentation is in TBD status. Miniterm replicates the Amiga Blitter miniterm calculations. Register usage restrictions like TRANSHI/-LO. Inputs are a consecutive set of four registers (D0-D3,D4-D7,E0-E3, etc.). Inputs are assigned as Channels A,B,C,Miniterm. The Output register will carry the result. -------- - LSLQ - -------- mnemonic: lslq <VEA>,b,d short: 64 Bit shift left equivalent C code _uint64_t a,b,d; d = b<<a; LSLQ is a 64 Bit shift left operation. The shift constant in input a is handled as modulo 64. While this operation shares the AMMX encoding, it is a full 64 Bit scalar operation. Zeroes are shifted into the LSBs. -------- - LSRQ - -------- mnemonic: lsrq <VEA>,b,d short: 64 Bit shift right equivalent C code _uint64_t a,b,d; d = b>>a; LSRQ is a 64 Bit right left operation. The shift constant in input a is handled as modulo 64. While this operation shares the AMMX encoding style, it is a full 64 Bit scalar operation. ----- BFLYB ----- Obsolete ------------------------------------------------------------------------------- AMMX/PSUBW mnemonic: psubw <VEA>,b,d short: vector subtract short graphic: input a ----------------------------------------- | a0 | a1 | a2 | a3 | ----------------------------------------- | | | | | | | | input b ----------------------------------------- | b0 | b1 | b2 | b3 | ----------------------------------------- | | | | | | | \_____________________ | \______ \______________ \ | \ \ \ ----------------- ----------------- ----------------- ----------------- | b0-a0 | | b1-a1 | | b2-a2 | | b3-a3 | ----------------- ----------------- ----------------- ----------------- | ____/ ____________/ / | / / ____________________/ | / / / ----------------------------------------- | d0 | d1 | d2 | d3 | ----------------------------------------- equivalent C Code: int i; short a[4]; short b[4]; short d[4]; for( i = 0 ; i<4 ; i++ ) { d[i] = a[i] - b[i]; } typical application cases: PADDW is a plain and simple vectorized 16 Bit subtraction operation that performs four independent sub.w operations in one shot. ------------------------------------------------------------------------------- AMMX/PADDW mnemonic: paddw <VEA>,b,d short: vector add short graphic: input a ----------------------------------------- | a0 | a1 | a2 | a3 | ----------------------------------------- | | | | | | | | input b ----------------------------------------- | b0 | b1 | b2 | b3 | ----------------------------------------- | | | | | | | \_____________________ | \______ \______________ \ | \ \ \ ----------------- ----------------- ----------------- ----------------- | b0+a0 | | b1+a1 | | b2+a2 | | b3+a3 | ----------------- ----------------- ----------------- ----------------- | ____/ ____________/ / | / / ____________________/ | / / / ----------------------------------------- | d0 | d1 | d2 | d3 | ----------------------------------------- equivalent C Code: int i; short a[4]; short b[4]; short d[4]; for( i = 0 ; i<4 ; i++ ) { d[i] = a[i] + b[i]; } typical application cases: PADDW is a plain and simple vectorized 16 Bit addition operation that performs four independent add.w operations in one shot. ------------------------------------------------------------------------------- AMMX/PSUBUSW mnemonic: psubusw <VEA>,b,d short: vector subtract unsigned short with saturation graphic: input a ----------------------------------------- | a0 | a1 | a2 | a3 | ----------------------------------------- | | | | | | | | input b ----------------------------------------- | b0 | b1 | b2 | b3 | ----------------------------------------- | | | | | | | \_____________________ | \______ \______________ \ | \ \ \ ----------------- ----------------- ----------------- ----------------- |min($ffff,b0-a0| |min($ffff,b1-a1| |min($ffff,b2-a2| |min($ffff,b3-a3| ----------------- ----------------- ----------------- ----------------- | ____/ ____________/ / | / / ____________________/ | / / / ----------------------------------------- | d0 | d1 | d2 | d3 | ----------------------------------------- equivalent C Code: int i; unsigned short a[4]; unsigned short b[4]; unsigned short d[4]; for( i = 0 ; i<4 ; i++ ) { d[i] = min(0xffff, b[i] - a[i] ); } typical application cases: PSUBUSW might come in handy for pixel manipulation with accuracy demands beyond 8 Bit per gun. Subtraction results are implicitly saturated, leaving the complete 16 Bit range available plus the ability to subtract dithering offsets, for example. ------------------------------------------------------------------------------- AMMX/PADDUSW mnemonic: paddusw <VEA>,b,d short: vector add unsigned short with saturation graphic: input a ----------------------------------------- | a0 | a1 | a2 | a3 | ----------------------------------------- | | | | | | | | input b ----------------------------------------- | b0 | b1 | b2 | b3 | ----------------------------------------- | | | | | | | \_____________________ | \______ \______________ \ | \ \ \ ----------------- ----------------- ----------------- ----------------- |min($ffff,b0+a0| |min($ffff,b1+a1| |min($ffff,b2+a2| |min($ffff,b3+a3| ----------------- ----------------- ----------------- ----------------- | ...
Teemu1.huusko