MACHINE INSTRUCTIONS AND HIGH LEVEL LANGUAGES - THE SAD DISCONNECT -----Warren D. Smith 2000----------------------------------------- I have no idea why HLLs (high level computer languages) intentionally try to make it difficult for you to access underlying machine primitives (add-with-carry, for example; double-length multiply and accumulate; double-length integer divide; circular bitshift). In such cases HLLs are not helping, they are impeding... My opinion is, HLLs should try to provide all machine instructions (for the union of most machine instruction sets). On a machine without one of these instructions, the HLL will have to compile to a software patch instead of hardware. But, it has been impossible to get HLL-writers to do this obvious thing, even after 40 years. This in turn leads the hardware designers not to bother to make useful instructions since the HLLs won't use them anyway, so "clearly" those instructions would not be useful. A vicious cycle of idiocy. I would also like to point out that the following instructions ought to be provided on more machines and HLLs: BIT MANIPULATION: popcount(x) returns number of 1's in binary expansion of x. Very useful for Hamming distance, error correcting code stuff, cardinalities of sets as bit vectors, computer chess, etc. find-first-one(x) finds location of the most significant 1 in x's binary. This is needed anyhow for integer to floating point conversion. Very useful for computer chess, looping thru elements of sets as bit vectors TOGGLEbit(x, n) XNORs bit #n of x with 1. SETbit(x, n) ORs ... with 1. ZERObit(x, n) ANDs ... with 0. GETbit(x, n) returns nth bit of x. perhaps these should be available, not just for single bits (the nth bit) but in fact for all bits 1..n at once, or all bits n..top at once. All these can be achieved using barrel shift and usual AND/OR/XOR instructions already about 3 times slower, but are used so often maybe should be available directly. Similarly max(x,y), min(x,y), swap(x,y) are used so often they may be worth having as 1 instruction. (Perhaps sort2(x,y) also: overwrites x with min(x,y), y with max.) Or at least as HLL-builtins. multiply-without-carry(a,b) just like computing a*b (double length integer) except all the binary adds in the shift & add algorithm, are replaced with carryless adds, i.e. XORs. Would be extremely useful for bit manipulations of all kinds. reverse-bit-order(x) the bits of x are reversed in their order. This allows access (with barrel shifting) to the full dihedral permutation group instead of just the cyclic group. Also very useful for FFT and butterfly algorithms. perfect-shuffle-bits(x) the w bits of x are permuted as 0-->0 1-->2 2-->4 3-->6 4-->8 5-->10 ... w/2-1-->w-1 w/2 -->1 w/2+1-->3 ... w-1 --> w-2 also the inverse of this permutation should be available. This allows access (with barrel shifting) to the full affine permutation group instead of just the cyclic group. Also, it is very useful in case you want to store matrices in "bit-interleaved order" since this permits the bit-interleaving in the address computation to be efficient. This matrix storage method is much better for cache performance when e.g. you do matrix transposition or multiplication. Also, if the ideas discussed in Knuth re presenting complex #s in a single machine word with a complex radix like (1+i) were ever implemented, again this instruction would be very useful. rightshift(x, n) shifts bits of x right by n. leftshift(x, n) shifts bits of x left by n. circshift(x, n) circular shifts bits of x right by n. (Note if n>wordsize/2, will cause left shifts.) These all should be available both with and without an extra "carry" bit. Of course add and subtract should be available with "carry" too. NUMBER THEORY: GCD(x,y) should be available in hardware so that rational numbers can be supported simply and so computers can catch up with 8 year old children. There are well known methods ("binary method") for performing this fast in hardware. Extended GCD would not be a bad idea either, or at least the computation of multiplicative inverse of A mod B. So that we can do modular arithmetic for a change. SIMD PARALLEL MANIPULATION: "Smart bit arrays" or arrays of "smart registers" would enable doing the same operation to all bits, or all registers, simultaneously in parallel (some operations could involve neighbor bits in a 2D bit array). This capability could vastly speed up many operations on 2D cellular automata, pixel graphics, vector operations, string processing, computer go, computer chess, etc. Another idea is for the processor to contain a "programmable gate array". Here you "design your own hardware" in software. The gate array reconfigures under software control to build a device to do what you want. Then you run that device a zillion times. This could vastly speed up circuit simulation and various special purpose tasks. HARDWARE MACROS? A related concept would be "hardware macros" where you download a special sequence of instructions into the processor which "compiles it into microcode." Then you can call that microcode as one instruction. This could run far faster than just doing the instructions since less memory bandwidth needed. TRUE (AND PSEUDO) RANDOM NUMBERS: Should be available in hardware. Key for Monte-Carlo simulations, video and audio signal processing. Same thing for system clock time (Key for profiling.) IEEE FLOATING POINT PRIMITIVES: like "the next larger real", "multiply with round up", "multiply with round down" etc all should be available, not as subroutine calls (I cannot believe that with most or all high level languages today, if you want to do x=floor(y) you need to make a SUBROUTINE call!!! Outrageous.) One common bug is, in machines with 80-bit extended reals, if you do a=b; ... then later test if a==b, you may get different results depending on whether the compiler or hardware unpredictably decided to move a or b to memory (64 bit format!) between the assignment and the test! The problem here is the sometime use of non-IEEE floating point (i.e. 80-bit extended) in an attempt to be "better than IEEE." This causes non-reproducibility and makes programs impossible to debug. IEEE standard actually should mean "IEEE standard," dammit.