My Talk At the September 22, 2007 Forth meeting.

Last Up date: 2007 October 2
This Page started on: 2007 September 30 11:00:13
This is a new "LaForth" page: The old one is here.

I was invited to give a talk at the Silicon Valley Fig Forth Meeting on 2007 September 22. I like talking with this group. I know of no other group that have more in depth knowledge of of their computers. (Forth almost requires and appeals to that sort of user.)

When programming in Assembly Language I am in control of what my computer is doing, at least for a while; I find that satisfying. But the current vogue among computer users is to get as far from the real hardware as possible. Years ago this made sense when there were many different computer architectures and Universities, teachers and software vendors wanted something that could be used on a variety of computers. Educational efforts were to have "Standard Languages" and avoid machine dependent programming. Assembly is the epitome of using the features of a given machine.

Today the world is different. It looks like the Intel 8086 architecture and its descendents will be with us for ever. Even Apple has dropped the Motorola and Power PC chips and gone to the Intel line. So I ask: Why not take advantage of every idiosyncrasy of that instruction set?

The underlying theme of my talk was to show that compilers can never generate even good code, simply because there is no way to communicate to the compiler all the programmer knows about the problem and the nature of the data it designed to work on.

I chose to talk about generating a table of Prime Numbers using every trick I know of to make it as fast as possible.

  1. All primes except 2 are odd numbers, therefore there is no point in keeping the low order bit of the numbers we are dealing with. This effectively doubles the size of values that can be in a fixed size register. How do you tell a C compiler that?
  2. Avoiding the testing of even numbers is obvious. Only one more instruction:
    xor cl,3 ;Avoids multiples of 3. Remember we are not keeping the low order bit.
    Will your compiler to generate a single byte XOR!
  3. Consecutive primes don't differ by as much as 512 until we get to truly huge primes. Thus, if we output the differences we only need one byte per prime number. Few compilers keep small values in a single byte without the low order bit!
  4. We can use an algorithm that never uses a divide instruction and the core loop is:
            qtest:  scasd           ;Test word in EAX against memory table
                    jb      qtest
    Yes, it is only two instructions! The Intel Scan String Double word instruction is only one byte long, with a two byte jump back to it. This loop, scanning a growing table of composite numbers is where the vast majority of the time is spent when the program is running.

    I know of no compiler that generates the Intel string instructions. Not even Forth!

  5. Finally, because consecutive primes differ by so little we don't even need to keep the high order parts of the values we are using. This involves a bit of tricky programming, but allowes me to generate primes far greater than 216 using 16 bit registers and no multiple precision arithmetic.

Some details about a run where I generated 163,840,000 primes (the last one was: 3,424,462,051) numbers are here. I have a link to the notes I used for the talk, and a link to a .ZIP file where the programs I talked about can be down-loaded. However, initially at least, the use of the programs may be best understood by those who heard the talk.


You can: E-mail me or Click on the phone numbers for my BIO:

Go to: My Home Page My phone #'s Go to: This page TOP