This Page started on: 2007 September 30 11:00:13

This is a new "LaForth" page: The old one is here.

I was invited to give a talk at the Silicon Valley Fig Forth Meeting on 2007 September 22. I like talking with this group. I know of no other group that have more in depth knowledge of of their computers. (Forth almost requires and appeals to that sort of user.)

When programming in Assembly Language I am in control of what my computer is doing, at least for a while; I find that satisfying. But the current vogue among computer users is to get as far from the real hardware as possible. Years ago this made sense when there were many different computer architectures and Universities, teachers and software vendors wanted something that could be used on a variety of computers. Educational efforts were to have "Standard Languages" and avoid machine dependent programming. Assembly is the epitome of using the features of a given machine.

Today the world is different. It looks like the Intel 8086 architecture and its descendents will be with us for ever. Even Apple has dropped the Motorola and Power PC chips and gone to the Intel line. So I ask: Why not take advantage of every idiosyncrasy of that instruction set?

The underlying theme of my talk was to show that compilers can never generate even good code, simply because there is no way to communicate to the compiler all the programmer knows about the problem and the nature of the data it designed to work on.

I chose to talk about generating a table of Prime Numbers using every trick I know of to make it as fast as possible.

- All primes except 2 are odd numbers, therefore there is no point
in keeping the low order bit of the numbers we are dealing with. This
effectively doubles the size of values that can be in a fixed size
register.
*How do you tell a C compiler that?* - Avoiding the testing of even numbers is obvious.
Only one more instruction:
`xor cl,3 ;Avoids multiples of 3.`*Remember we are not keeping the low order bit.*

Will your compiler to generate a single byte XOR! - Consecutive primes don't differ by as much as 512 until we get to
truly huge primes. Thus, if we output the differences we only need one
byte per prime number.
*Few compilers keep small values in a single byte without the low order bit!* - It turns out this form of output lends itself to a more compact storage of the primes. (Yes, even this can be made significantly smaller in a .ZIP file.)
- Other analysis such as counting the number of prime twins is much easier, than it would be if only a decimal listing of the primes had been the main output.
- Converting the output to decimal can take several different forms
for more analysis of the primes.
*And can be done converting only a single byte to decimal then using decimal arithmetic add avoiding the slow conversion of big values from binary to decimal. This is a***real advantage**because the high decimal digits don't change often, thus we are not recalculating the same digits over and over. - We can use an algorithm that never uses
a divide instruction and the core loop is:
qtest: scasd ;Test word in EAX against memory table jb qtest

Yes, it is only two instructions! The Intel Scan String Double word instruction is only one byte long, with a two byte jump back to it.*This loop, scanning a growing table of composite numbers is where the vast majority of the time is spent when the program is running.*I know of no compiler that generates the Intel string instructions. Not even Forth!

- Finally, because consecutive primes differ by so little we don't
even need to keep the high order parts of the values we are using.
*This involves a bit of tricky programming, but allowes me to generate primes far greater than 2*^{16}using 16 bit registers and no multiple precision arithmetic.

Some details about a run where I generated 163,840,000 primes (the last
one was: 3,424,462,051) numbers are
here.
I have a
link to the notes I used for the talk,
and a
link to a .ZIP file
where the programs I talked about can be
down-loaded. *However, initially at least, the use of the programs may
be best understood by those who heard the talk.*

Go to:
My Home Page |
Go to:
This page TOP |