While there are some excellent sources for information about theory behind fixed point, most sources present the facts and fail to provide some examples to show the idea in action.
On this page you can find an online demo of the representation of fixed point and some of its operations. While there might be other schemes out there, the techniques here have been used in various projects and as such should be useful.
Note that the target audience is expected to have a basic understanding of binary representation of signed and unsigned numbers and arithmetic operations at the binary level.
Use the fields above to type a representation and finish with enter to update the other representations.
Because fixed point is a form of interpretation rather than binary encoding, addition and substraction between 2 fixed point numbers is the same as integer addition or substraction.
Example using FIX_8_4:
| 4.50 | |
| 0.75 | + |
| 5.25 | |
Now the same example using binary addition in FIX_8_4 (the dot is only to emphasize the fractional part):
| 0100.1000 | |
| 0000.1100 | + |
| 0101.0100 | |
Naturally, overflow is still a posibility. But when handling fixed point logic, one can use normal integer addition and substraction.
Multiplication is a bit trickier.
First, you need to align the virtual dot in both numbers and then sign extend and pad both until their lengths match.
Lets look at FIX_8_2 0.5 and FIX_4_3 -0.25, where the result will be FIX_9_3:
| FIX_8_2: | 000000.10 | => | 000000.100 |
| FIX_4_3: | 1.110 | => | 111111.110 |
| 0000.1000 | 0.50 | |
| 0000.0100 | * | 0.25 |
| 0000000.00100000 | 0.125 | |
The same procedure works for inputs with a mixed sign. Lets multiply FIX_3_2 0.5 and FIX_3_2 -0.25 where the result will be FIX_6_4.
Note that we sign-extend the inputs to the double bit width in order to get the correct multiplication result. The result of this multiplication is in essence a 12-bit result but the upper 6-bits will be discarded.
| 0.10 | => | 0000.10 | 0.50 | ||
| 1.11 | => | 1111.11 | * | -0.25 | * |
| 11.1110 | -0.125 | ||||
Division is similar to multiplication but requires a different alignment.
The difference between number of fractional bits between the numerator and denominator will yield the number of fractional bits in the answer.
If we would divide a numerator in UFIX_8_4 notation by a denominator in UFIX_8_4, the result will be a UFIX_4_4 number.
Now, if we divide a UFIX_12_8 number by a UFIX_8_4 number, the result will be a UFIX_8_4 number: the highest number of integer bits is 4, so the answer will have 4 integer bits.
The numerator has 8 fractional bits while the denominator has 4, the result of the division will have 8 - 4 = 4 fractional bits.
Note that this means that the numerator has to have at least the same number of bits as the denominator to end up with a complete integer result.
If the denominator has more fractional bits than the numerator, the result will not be complete as the lower integer bits will be truncated.
Lets do an example: we will divide 2 by 8, both of which will be in UFIX_8_4 notation: 0x20 / 0x80. As noted before, the end result will be in UFIX_4_0 notation.
| 0x20 | 0010.0000 | ||
| 0x80 | 1000.0000 | ||
| / | |||
| 0x0 | 0000. | ||
Converting 2.0 to UFIX_12_8 notation means shifting it 4 bits to the left compared to the first example: 0x200.
| 0x200 | 0010.0000 0000 | ||
| 0x80 | 1000.0000 | ||
| / | |||
| 0x4 | 0000.0100 | ||
To convert between decimal and fixed point, you simply multiply or divide by the power of 2 that you scale with.
Example:
Lets say you want to convert the decimal '1.05' to FIX_8_5 (8 bits, 5 of which are fractional). You determine the scale factor using the position of the virtual dot: 25 = 32.
Now multiply the decimal with the scale: 1.05 * 32 = 33.6. This needs to be rounded to 34 or 0x22 to obtain the 'raw' fixed point value.
Example 2:
We have 0x42 as a FIX_8_7 number and we want to know its decimal value. The scale factor is 27 = 128.
So 0x42 * 128 = 66 / 128 = 0.52 (rounded).