The fact that ﬁnite precision introduces error means that we should consider how great the error is (by “error”, we mean the distance between two adjacent representable numbers), and whether it is acceptable for our application. As an example of a potential pitfall, consider representing one million in ﬂoating point, and then subtracting one million 1’s from it. We may still be left with a million if the error is greater than 1.
- 1 = In order to characterize error, range, and precision, we use the following notation:
- b = Base
- s = Number of signiﬁcant digits (not bits) in the fraction
- M = Largest exponent
- m = Smallest exponent
The number of signiﬁcant digits in the fraction is represented by s, which is different than the number of bits in the fraction if the base is anything other than 2 (for example, base 16 uses four bits for each digit). In general, if the base is 2k where k is an integer, then k bits are needed to represent each digit. The use of a hidden 1 increases s by one bit even though it does not increase the number of representable numbers. In the previous example, there are three signiﬁcant digits in the base 16 fraction and there are 12 bits that make up the three digits.
There are three bits in the excess 4 exponent, which gives an exponent range of [−22 to 22 − 1]. For this case, b = 16, s = 3, M = 3, and m = −4.In the analysis of a ﬂoating point representation, there are ﬁve characteristics that we consider: the number of representable numbers, the numbers that have the largest and smallest magnitudes (other than zero), and the sizes of the largest and smallest gaps between successive numbers.
The number of representable numbers can be determined as shown in Figure 2-8. The sign bit can take on two values, as indicated by the position marked with an encircled “A.” The total number of exponents is indicated in position B.
Figure 2-8 Calculation of the number of representable numbers in
a floating point representation.
Note that not all exponent bit patterns are valid in all representations. The IEEE 754 ﬂoating point standard, which we will study shortly, has a smallest exponent of −126 even though the eight-bit exponent can support a number as small as −128. The forbidden exponents are reserved for special numbers, such as zero and inﬁnity.
The ﬁrst digit of the fraction is considered next, which can take on any value except 0 in a normalized representation (except when a hidden 1 is used) as indicated by (b − 1) at position C. The remaining digits of the fraction can take on any of the b values for the base, as indicated by bs-1 at position D. If a hidden 1 is used, then position C is removed and position 4 is replaced with bs
Finally, there must be a representation for 0, which is accounted for in position E. Consider now the numbers with the smallest and largest magnitudes. The number with the smallest magnitude has the smallest exponent and the smallest nonzero normalized fraction. There must be a nonzero value in the ﬁrst digit, and since a 1 is the smallest value we can place there, the smallest fraction is b−1.
The number with the smallest magnitude is then bm·b−1 = bm−1. Similarly, the number with the largest magnitude has the largest exponent and the largest fraction (when the fraction is all 1’s), which is equal to bM ·(1 − b−s ).
The smallest and largest gaps are computed in a similar manner. The smallest gap occurs when the exponent is at its smallest value and the least signiﬁcant bit of the fraction changes. This gap is bm·b−s = bm−s
The largest gap occurs when the exponent is at its largest value and the least signiﬁcant bit of the fraction changes. This gap is bM·b−s = bM−s
As an example, consider a ﬂoating point representation in which there is a sign bit, a two-bit excess 2 exponent, and a three-bit normalized base 2 fraction in which the leading 1 is visible; that is, the leading 1 is not hidden. The representation of 0 is the bit pattern 000000. A number line showing all possible numbers that can be represented in this format is shown in Figure 2-9.
Notice that there is a relatively large gap between 0 and the ﬁrst representable number, because the normalized representation does not support bit patterns that correspond to numbers between 0 and the ﬁrst representable number. The smallest representable number occurs when the exponent and the fraction are at their smallest values. The smallest exponent is −2, and the smallest normalized fraction is (.100)2. The smallest representable number is then bm×b−1 = bm−1 = 2−2−1 = 1/8.
Similarly, the largest representable number occurs when the exponent and the fraction are both at their largest values. The largest fraction occurs when the fraction is all 1’s, which is a number that is 2−3 less than 1 since there are three digits in the fraction. The largest representable number is then bM ×(1 - b−s ) = 21 × (1 - 2−3) = 7/4.
The smallest gap occurs when the exponent is at its smallest value and the least signiﬁcant bit of the fraction changes, which is bm×b−s = bm−s = 2−2−3 = 1/32.
Similarly, the largest gap occurs when the exponent is at its largest value and the least signiﬁcant bit of the fraction changes, which is bM×b−s = bM−s = 21−3 = 1/4.
The number of bit patterns that represent valid numbers is less than the number of possible bit patterns, due to normalization. As discussed earlier, the number of representable numbers consists of ﬁve parts, which take into account the sign bit, the exponents, the ﬁrst signiﬁcant digit, the remaining digits, and the bit pattern for 0. This is computed as shown below:
2 × ((M − m) + 1) × (b − 1) × bs−1 + 1
= 2 × ((1 − (−2)) + 1) × (2 − 1) × 23−1 + 1
Notice that the gaps are small for small numbers and that the gaps are large for large numbers. In fact, the relative error is approximately the same for all numbers. If we take the ratio of a large gap to a large number, and compare that to the ratio of a small gap to a small number, then the ratios are the same:
The representation for a “small number” is used here, rather than the smallest number, because the large gap between zero and the ﬁrst representable number is a special case.