Tuesday, March 21, 2006

As Moore fails...

Gordon Moore became more famous because of his one casual statement, that was taken as a law. That law held good until recently. Moore's law was helpful for many computing companies to predict how fast their software would run after a signficant number of years, assuming that the particular software stands in the industry for that long.

Everything went fine, until one fine day, we found the mortality of Moore's Law. Suddenly all the programmers and processor designers started looking for alternate solutions. Programmers came up with multithreading and designers, with multi-core. Multithreading can be accomplished by running different threads on different processor cores.

Most of the processor design innovations first show up in digital signal processors and that is slowly taken into the building of microprocessors. Multi-core is not an exception. The technology in this field grows at a very fast rate. When we were all thinking that multi-core was cutting edge, "Inside DSP" magazine has reported that MIPS has come with a new DSP processor with a multithreaded licensible core, called MIPS 32 34K.

This has a single core with a built-in hardware support for switching between multiple threads. MIPS makes the rationale behind this multithreaded core with a reasonable claim that most of the time the processor is waiting idle for memory or IO. By switching between threads, the MIPS processor can use the otherwise idle cycles for doing some other useful operation. MIPS has reported that a 360MHz multithreaded processor could run 50% faster than a 400MHz single threaded processor. All these come with a small increase in the size of die.

MIPS 34K actually features five different "Thread Contexts" (TCs). Each TC has it own program counter, and register files. The processor can be tuned to switched between different thread for each and every clock cycle.

MIPS does not stop with that. It is also providing two virtual processing elements (VPEs). These VPEs have features to support OS, like look-aside buffer, etc. Meaning: at a time two operating systems can be run on a same processor. Mostly one will be an OS and the other an RTOS. That is, the same processor will run both Linux and VxWorks at the same time. MIPS also gives option to the system programmers to allocate different priorities between the two OS. Like quarter of the time is used up by Linux and the rest three-quarters are allocated to VxWorks.

But using all these is in the hands of the programmer. Programmers can only take advantage of the 34K’s multithreading capability if they write their code with multithreading in mind. This will add complexity to the software development process.

When Moore's law failed, programmers thought that the programs written so far won't run any faster than it does now. But the situation changed. This made designers think different. Innovation took place and the programmers are asked to think different to adapt to the changing environment. And the programmers did. The net effect is that we have a new system in hand. Now we know why the human race survived and dominated across different hurdles until their origin.

3 comments:

  1. Hi Krish,
    Read all your blogs. They are real good thought provokers. Keep up the momentum.
    Regards,
    Madhu.

    ReplyDelete
  2. The main reason behind failure of Moore Law is the assumption that interconnect delays inside the components (logic gates) are negligible. In reality, as the chip shrunk, the interconnect delays started showing up as a good percentage of gate delays. At 45nm, interconnect delays are said to be about 40-50% of gate delays. Because of this, the speed curve started becoming flat, that is, we cannot make things faster than a limit.

    Above all concept is valid with clocked systems where the delay and interconnect cost affects the clock jitter and combinational logic delay. Even in asynchronous logic circuits, interconnect amounts a lot.

    Hence due to such reasons people are turning toward multi-core processing and multi-threading environments. However both require strong compliers and efficient delegation heutristics.

    Yogendra Namjoshi

    ReplyDelete
  3. @Yogendra Namjoshi
    Good point. As we integrate more logic into a smaller area, the interconnect delay becomes prominent. Even in 250nm, reducing the route delay below 30% is challenging.

    But I don't agree that Moore's Law was based on the assumption that interconnect delay does not exist. Moore's Law is more of a visionary statement rather than law of physics. It's based on Moore's optimism about the speed of technological growth. And good that we stood up to that.

    Interconnect delay may be one of the reason for the evolution of multi-core processing, but that's not the main reason. The main reason is that the uniprocessor was turning out to be a boiler plate as we integrate complex logic into it. To reduce the heating up, multiple cores can be introduced - each simpler than the actual uniprocessor (with the same speed) envisioned.

    Strong compilers - absolutely required. Multicore poses new challenges to compiler design like optimizing locality, interprocessor race conditions etc. Extensions to the existing language to support parallel processing is another work going on. In terms of delegation heuristics, it's not really new. We did it with Beowulf clusters and Cray supercomputers. Load balancing among, say 4 cores, is not really difficult (not easy either).

    Once the cores are many, parallelism is in the hands of the programmer rather than the assembler or compiler. So what we lack is a set of good programmers in this parallel processing environment.

    ReplyDelete