Floating Point and IEEE 754 Compliance for NVIDIA GPUs (2024)

In 2008 the IEEE 754 standard was revised to include the fused multiply-add operation (FMA). The FMA operation computes rn ( X × Y + Z ) with only one rounding step. Without the FMA operation the result would have to be computed as rn ( rn ( X × Y ) + Z ) with two rounding steps, one for multiply and one for add. Because the FMA uses only a single rounding step the result is computed more accurately.

Let's consider an example to illustrate how the FMA operation works using decimal arithmetic first for clarity. Let's compute x 2 1 with four digits of precision after the decimal point, or a total of five digits of precision including the leading digit before the decimal point.

For x = 1.0008 , the correct mathematical result is x 2 1 = 1.60064 × 10 4 . The closest number using only four digits after the decimal point is 1.6006 × 10 4 . In this case rn ( x 2 1 ) = 1.6006 × 10 4 which corresponds to the fused multiply-add operation rn ( x × x + ( 1 ) ) . The alternative is to compute separate multiply and add steps. For the multiply, x 2 = 1.00160064 , so rn ( x 2 ) = 1.0016 . The final result is rn ( rn ( x 2 ) 1 ) = 1.6000 × 10 4 .

Rounding the multiply and add separately yields a result that is off by 0.00064. The corresponding FMA computation is wrong by only 0.00004, and its result is closest to the correct mathematical answer. The results are summarized below:

x = 1.0008 x 2 = 1.00160064 x 2 1 = 1.60064 × 10 4 true value rn ( x 2 1 ) = 1.6006 × 10 4 fused multiply-add rn ( x 2 ) = 1.0016 × 10 4 rn ( rn ( x 2 ) 1 ) = 1.6000 × 10 4 multiply, then add

Below is another example, using binary single precision values:

A = 2 0 × 1.00000000000000000000001 B = 2 0 × 1.00000000000000000000010 rn ( A × A + B ) = 2 46 × 1.00000000000000000000000 rn ( rn ( A × A ) + B ) = 0

In this particular case, computing rn ( rn ( A × A ) + B ) as an IEEE 754 multiply followed by an IEEE 754 add loses all bits of precision, and the computed result is 0. The alternative of computing the FMA rn ( A × A + B ) provides a result equal to the mathematical value. In general, the fused-multiply-add operation generates more accurate results than computing one multiply followed by one add. The choice of whether or not to use the fused operation depends on whether the platform provides the operation and also on how the code is compiled.

Figure 1 shows CUDA C++ code and output corresponding to inputs A and B and operations from the example above. The code is executed on two different hardware platforms: an x86-class CPU using SSE in single precision, and an NVIDIA GPU with compute capability 2.0. At the time this paper is written (Spring 2011) there are no commercially available x86 CPUs which offer hardware FMA. Because of this, the computed result in single precision in SSE would be 0. NVIDIA GPUs with compute capability 2.0 do offer hardware FMAs, so the result of executing this code will be the more accurate one by default. However, both results are correct according to the IEEE 754 standard. The code fragment was compiled without any special intrinsics or compiler options for either platform.

The fused multiply-add helps avoid loss of precision during subtractive cancellation. Subtractive cancellation occurs during the addition of quantities of similar magnitude with opposite signs. In this case many of the leading bits cancel, leaving fewer meaningful bits of precision in the result. The fused multiply-add computes a double-width product during the multiplication. Thus even if subtractive cancellation occurs during the addition there are still enough valid bits remaining in the product to get a precise result with no loss of precision.

Figure 1. Multiply and Add Code Fragment and Output for x86 and NVIDIA Fermi GPU

union { float f; unsigned int i} a, b;float r;a.i = 0x3F800001;b.i = 0xBF800002;r = a.f * a.f + b.f;printf("a %.8g\n", a.f); printf("b %.8g\n", b.f); printf("r %.8g\n", r);

x86-64 output:

a: 1.0000001 b: -1.0000002 r: 0

NVIDIA Fermi output:

a: 1.0000001 b: -1.0000002r: 1.4210855e-14
Floating Point and IEEE 754 Compliance for NVIDIA GPUs (2024)
Top Articles
Latest Posts
Article information

Author: Clemencia Bogisich Ret

Last Updated:

Views: 6005

Rating: 5 / 5 (60 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Clemencia Bogisich Ret

Birthday: 2001-07-17

Address: Suite 794 53887 Geri Spring, West Cristentown, KY 54855

Phone: +5934435460663

Job: Central Hospitality Director

Hobby: Yoga, Electronics, Rafting, Lockpicking, Inline skating, Puzzles, scrapbook

Introduction: My name is Clemencia Bogisich Ret, I am a super, outstanding, graceful, friendly, vast, comfortable, agreeable person who loves writing and wants to share my knowledge and understanding with you.