iApplianceWeb.com

EE Times Network
News Flash Appliance Insights Appliance Directory Standards in IA Webcasts


 

ARM cranks up the clock of next-generation processor core

By
EE Times
(04/26/02, 03:16:12 PM EDT)

SAN MATEO, Calif. ARM Holding Ltd. will make a big performance push with its next-generation ARM11 microarchitecture, which will reach speeds of 750-MHz based on 0.13-micron design rules, according to a presentation posted on the company's Web site. The architecture will be ready for market by the end of the year.

While ARM's processor cores are widely used in cellular phones and PDAs where low-power operation is key, the company has been under increasing pressure to crank up performance to fend off competitors and clear a path to new markets in the embedded systems space, according to analysts.

There's also a growing perception that ARM has been overshadowed by the performance claims of Intel Corp., an ARM licensee and purveyor of Xscale processors. Intel's processors could put ARM's other licensees in a bind.

"What ARM has to worry about is some licensees whose chips suddenly end up having an unfair advantage competing in the marketplace," said Cary Snyder, an analyst with MicroDesign Resources.

The ARM11 could help level the playing field. Besides touting higher clock frequencies, it will be the first microarchitecture to implement ARM's newest v6 instruction set, which includes SIMD instructions for accelerating multimedia applications. The v6 instructions also include enhanced memory management, multiprocessing capabilities, improved data handling and more efficient interrupt management.

ARM is expected to publicly disclose details of ARM11 at next week's Embedded Processor Forum in San Jose, Calif.

ARM declined to release details on its forthcoming processors before next Tuesday (April 30), but a presentation about the ARM11 authored by lead designer Ian Devereux and posted on the company's Web site said the processor based on 0.13-micron design rules aims for worst-case clock speeds of 350 MHz to 500 MHz. Typical performance will range from 533 MHz to 750 MHz.

To boost clock frequency, ARM has lengthened its processor's pipeline to eight-stages with single-issue, out-of-order completion. The Level 1 cache can be accessed in two cycles. The pipeline is described as a balanced design built around the timing of the arithmetic logic unit (ALU) and Level 1 memory access.

To offset the need for more control code in the longer pipeline, the ARM11 will implement more elaborate dynamic branch prediction. Other elements of the control flow operation include the use of return stacks to minimize the impact of procedure returns as well as early jumps, according to ARM.

The load/store portion of the pipeline has been "decoupled" to provide high data bandwidth. It has 64-bit data paths that allow two registers to be read or written every clock cycle. It also has the ability to retire instructions before completion and to restore registers in the background during execution.

To keep power consumption in check, ARM11 includes clock-gating across nearly all the registers, can disable unused logic and has the ability shut down the entire clock network when waiting for an interrupt. As a result, the ARM11 and cache controllers will dissipate less than 0.4 milliwatts per MHz from a core that is roughly 50 percent larger than previous cores, according to the Web presentation.

Since its posting earlier this week, the ARM11 presentation has been removed from the ARM Web site.

For efficient processing of multimedia instructions, the processor's ALU will implement both standard 32-bit operations and SIMD operations to minimize power and gate usage. There's a three-stage multiplier that can sustain dual 16 x 16 multiply-accumulate operations per cycle while remaining synchronized with ALU operations for simpler control. Most v5TE and v6 multiplies can be issued back-to-back, each taking one cycle, according to the Web presentation.

To address the growing importance of Java, ARM11 includes the company's Jazelle technology for executing Java byte codes. Decoding of Java byte code is divided across two pipeline stages, extending the pipeline by one stage during Java state. Both dynamic and static branch prediction are used for Java branches.

To mitigate the effect of interrupt latencies, the ARM11 takes advantage of v6 instructions to reduce time spent in exception handling, has direct attachment of a vector interrupt controller and includes a low-interrupt latency mode, according to the online presentation.

The processor is also designed to support optional floating-point operations and application-specific co-processors. ARM expects to tape out and publicly announce its first ARM11 core by the fourth quarter.




Copyright © 2004 Appliance-Lab
Terms and Conditions
Privacy Statement