|
|
HYPER-THREADING
TECHNOLOGY GUIDE 14th November 2002
Intel’s Hyper-Threading technology promises free extra performance by offering some of the benefits of a Dual CPU system but with only one processor. Is it destined to be a must-have technology? A separate article will give you the benchmarks of Hyper-Threading on a 3.06GHz Pentium4 so you can see what difference Hyper-Threading will make in a variety of scenarios. In this article we will try to explain as much as we can about the technology without sending our readers to sleep. A follow-up article will go into much more detail about exactly how Hyper-Threading works to explain why some applications benefit more than others, but it will be heavy going and readers would do well to have strong coffee within easy reach. This article by comparison will start generally easy and get moderately complicated but will furnish readers with everything they need to know to make the most of Hyper-Threading from the user’s point of view.
What
is Hyper-Threading? Hyper-Threading is a microprocessor simultaneous multithreading technology (SMT) that supports the concurrent execution of multiple separate instruction streams, referred to as threads of execution, on a single physical processor. When Hyper-Threading is used with the Intel processors that support it, there are two threads of execution per physical processor.
What does it do? You can see from the above that Intel expects immediate results with existing applications by separating threads, with the promise of even better performance in future applications that are Hyper-Threading optimized (by avoiding competing for resources between concurrent threads). This is in contrast to optimizations such as MMX and SSE1/2, which needed optimizations to see any improvements. How this is done is explained further in the article and in greater detail in a follow-up article. What do you need for Hyper-Threading to work?
These requirements are fairly straightforward. Obviously a compatible CPU is required and that really means a Pentium4 3.06GHz and above (we’ll ignore Xeon CPUs for the purposes of this article as it’s aimed at the home user). You will need a Hyper-Threading enabled Intel chipset so check with your motherboard manufacturer, they should have a list of Hyper-Threading compatible boards on their web sites. The power and thermal requirements should not pose any problems as long as a decent P4 PSU is used. If you’re not going to use the stock heat sink (OEM chip or a desire to use alternate cooling methods) make sure whatever you use is at least as effective as the retail cooler. The system Intel sent us used a large all-copper heat sink and an 80mm fan so don’t take this too lightly. What changes were made to the CPU? Above is highlighted the extra hardware that was added to make the Pentium4 Hyper-Threading enabled. This has actually been present on previous Northwood processors but was disabled until now. Extra real-estate adds about 5% to the size of the chip but it should be noted that many resources are shared otherwise virtually everything would have had to be duplicated, essentially resulting in two CPU cores on one chip with the associated cost increase.
How are Operating Systems supporting
Hyper-Threading? The
Hyper-Threading in the processor makes two architectural states available on the
same physical processor. Each architectural state can execute an instruction
stream, which means that two concurrent threads of execution can occur on a
single physical processor. Each thread of execution can be independently halted
or interrupted. These architectural states are referred to as logical
processors. The
main difference between the execution environment provided by the
Hyper-Threading processor, compared with that provided by two traditional
single-threaded processors, is that Hyper-Threading shares certain processor
resources: there is only one execution engine, one on-board cache set, and one
system bus interface. This means that the logical processors on a
Hyper-Threading processor must compete for use of these shared resources. As a
result, a Hyper-Threading processor will not provide the same performance
capability as two similarly equipped single-threaded processors. It
is important to note that the two logical processors on a Hyper-Threading
processor are treated equally with respect to access to the shared resources.
This article refers to the logical processors on a Hyper-Threading processor, in
order of use, as the first and second logical processors. Windows XP
and Windows .NET Server include generic identification and support for
IA-32 processors that implement Hyper-Threading using the Intel-defined CPUID
instruction identification mechanism. However, support is not guaranteed for
processors that have not been tested with these operating systems. SMT
processors may support more than two logical processors in the future. However,
the discussions and examples here assume the use of two logical processors, as
used in the Pentium4 3.06GHz and above family of processors. Windows
software should run unmodified, and without error, on Hyper-Threading-enabled
systems. In general, multithreaded Windows applications perform better when
running unmodified on a Hyper-Threading processor than they do on a similarly
equipped single-threaded processor. The performance gain varies depending on the
application. The best performance gain is typically achieved by applications
whose threads compete the least for shared resources on the processor. Window 2000 does not and never will support Hyper-Threading
What about BIOS support for Hyper-Threading? The
system BIOS provides two important Hyper-Threading features:
The
sequence in which logical processors are started can be very important,
especially when running software that is not Hyper-Threading-aware on a
Hyper-Threading-enabled system. The
BIOS is responsible for starting up the logical processors. A list of all of the
logical processors that have been started is created by the BIOS and provided to
the operating system in the Multiple APIC Description Table (MADT). This table
is defined in the Advanced Configuration and Power Interface (ACPI) V2.0
specification. The BIOS passes the MADT to the operating system as part of the
ACPI data. Windows will attempt to utilize the logical processors in the same
sequence as the BIOS listed them in the MADT. Intel's
recommendation is to list the first logical processor on each of the physical
Hyper-Threading processors before listing any of the second logical processors.
This strategy ensures that the operating system attempts to utilize the logical
processors in that order. Listing the first logical processor on each of the
physical Hyper-Threading processors should help to ensure that the optimal
performance is achieved on software that is not Hyper-Threading-aware.
Performance on non-Hyper-Threading-aware versions of the Windows operating
system, such as Windows 2000, may not be optimal if this direction is not
followed in the BIOS. To
facilitate performance verification efforts and to support configurations using
more than 16 physical Hyper-Threading processors, Intel has recommended that
BIOS vendors include an option in their BIOS menus to disable Hyper-Threading.
Selecting the Disable Hyper-Threading
option will cause the BIOS to start up only the first logical processor on each
Hyper-Threading processor and to disable the second logical processor. If
Hyper-Threading is disabled, the MADT provides information to the operating
system only about the first logical processors; none of the second logical
processors are utilized. It is critical that the BIOS list the logical processors in the recommended sequence for systems that run Windows 2000. If the logical processors are not listed by the BIOS in the recommended sequence, system performance may be degraded.
Are
there any licensing issues? Each
logical processor that is contained within a Hyper-Threading processor appears
to the operating system as an individual processor. This means that tools or
services within Windows that display information about processors, such as the
Windows Task Manager or Windows Performance Monitor, will display processor
information for every logical processor that Windows is utilizing. Intel’s
processor identification methodology has been updated to support the software
identification of Hyper-Threading using the CPUID instruction. Operating system
and application software can use this identification mechanism to detect the
presence of Hyper-Threading processors and to provide support for features such
as Hyper-Threading-aware product licensing. Windows .NET Server supports an
API that provides the logical to physical mapping for the processors in the
system. The current Windows operating system licensing model for
Hyper-Threading-enabled systems is to require a processor license for each
physical processor. However, it is important to note that any software product
that was released before the introduction of Hyper-Threading will not support
Hyper-Threading detection and will treat each logical processor as if it were an
individual physical processor. This
licensing model applies to all 32-bit versions of Windows XP and Windows .NET
Server. This model delivers the performance benefit of utilizing both logical
processors for each processor that the Windows license supports. The processor
limits which result from this licensing model for 32-bit versions of Windows .NET
Server and Windows XP are shown below.
If
seventeen Hyper-Threading processors are listed by the BIOS, Windows .NET
Datacenter Server will exhaust the 32-processor limit using both logical
processors on the first 16 physical processors listed. The operating system will
not use either logical processor on the seventeenth physical processor. As
described earlier, utilizing a single logical processor on an idle physical
Hyper-Threading processor provides better performance than utilizing the second
logical processor on a physical processor that already has an active logical
processor. As
a result, Microsoft’s recommendation for systems that contain more than 16
physical Hyper-Threading processors is to disable Hyper-Threading at the BIOS
before installing or booting Windows. Because the performance benefit provided
by the second logical processors in a Hyper-Threading system decreases as the
number of physical processors in the system increases, it is not anticipated
that the lack of Hyper-Threading support on systems with more than 16 physical
Hyper-Threading processors will have a significant impact on the performance of
the system.
How
do applications deal with Hyper-Threading? Windows application software should run unmodified, and without error, on Hyper-Threading-enabled systems. In general, multithreaded Windows applications perform better when running unmodified on a Hyper-Threading processor than they do on a similarly equipped single-threaded processor. The performance gain varies depending on the application. To
take advantage of Hyper-Threading, software designers may want to modify their
applications to support features such as:
Applications
must identify the presence of Hyper-Threading to perform Hyper-Threading-aware
enforcement of per-processor licensing rules or to create a
Hyper-Threading-aware execution environment for the application processes and
threads. To perform these types of functions, applications use the system
processor affinity mask. On
Hyper-Threading-enabled systems, each logical processor is treated as an
individual processor by the operating system and is represented by a bit in the
system affinity mask. This is true for both Hyper-Threading-aware and
non-Hyper-Threading-aware releases of the Windows operating system. The detection process requires the application to loop through each logical processor that is represented in the system processor affinity mask and to set affinity to that processor. Information that is made available by the CPUID instruction may then be used to identify the physical processor on which the logical processor executing the code resides. This algorithm allows the application to create a list that relates the bits in the Windows processor affinity mask to the logical and physical processors in the system.
How
can applications make better use of Hyper-Threading? In
general, multithreaded Windows applications perform better when running
unmodified on a Hyper-Threading processor than they do on a similarly equipped
single-threaded processor. To optimize the application performance benefit on
Hyper-Threading-enabled systems, the application should ensure that the threads
executing on the two logical processors have minimal dependencies on the same
shared resources on the physical processor. With an understanding of how the
application threads and processes utilize the shared resources on a
Hyper-Threading processor, setting processor affinity to minimize competition
for these system resources can help application performance. The
following example scenarios describe good and bad ways to set thread affinities:
An
application feature that could increase performance is the utilization of a
YIELD instruction in any code that spins tightly in a loop, particularly if the
code is waiting for access to shared data. Care should be taken with any code that plans capacity based on the number of processors in the system. As discussed earlier, two logical processors on the same physical processor appear to applications as two processors, but typically provide around 10% to 30% more performance than a similarly equipped non-Hyper-Threading-enabled processor. Any code that calculates capacity and creates load based on the number of processors should check for Hyper-Threading-enabled processors and plan accordingly
Conclusion Intel
’s Hyper-Threading Technology brings the concept of simultaneous
multi-threading to the Intel Architecture. This is a significant new technology
direction for Intel s future processors. It will become increasingly important
going forward as it adds a new technique for obtaining additional performance
for lower transistor and power costs. In
this implementation there are two logical processors on each physical processor.
The logical processors have their own independent architecture state, but they
share nearly all the physical execution and hardware resources of the processor.
The goal was to implement the technology at minimum cost while ensuring forward
progress on logical processors, even if the other is stalled, and to deliver
full performance even when there is only one active logical processor. These
goals were achieved through efficient logical processor selection algorithms and
the creative partitioning and recombining algorithms of many key resources.
Measured performance with Hyper-Threading Technology shows performance gains of
up to 30% on common server application benchmarks for this technology. The
potential for Hyper-Threading Technology is tremendous; the current
implementation has only just begun to tap into this potential. Hyper-Threading
Technology is expected to be viable from mobile processors to servers and shows
there are other innovations to increase performance than the race for
ever-faster clock speeds.
All trademarks are the property of their respective owners. |
|
||||||||||||||||||||