Saturday, June 15, 2013

Principles of Machine Mathematics (Part 3 of 3)

An Guide to Two's Complement from a Set Theoretical Perspective:

Part 3

Binary Numbers

If you look back to the ASCII table (http://www.asciitable.com/) that we referenced in part 1 of this post, you will notice that in addition to columns containing the decimal integers and the symbols, there are columns for hexadecimal and octal representations of integers and HTML. The reason hexadecimal and octal codes are useful is because they translate easily into binary numbers, which much more directly represent the way that computers store information.

Every conceivable kind of information that a computer stores or manipulates is represented in bits, which correspond to the voltage in some part of some circuit being either "high" or "low" according to some pre-defined thresholds with respect to some consistent reference value. The highness or lowness of voltages is a 2-state system that is very naturally represented by 0's and 1's, the only two digits used to represent numbers in a binary numeral system. This is a vast enough topic that it makes no sense for me to cover it in detail in my blog; see Wikipedia for details (https://en.wikipedia.org/wiki/Binary_number). One characteristic of interest, though, is that with positive numbers in binary notation, addition with pencil and paper works the same way that it does for decimal numbers, where you start from the one's place and carry overflow to the next lowest place.

The Sign & Magnitude Convention

Because computer's can't directly store arbitrary symbols or concepts such as a negative sign, the most straightforward way to represent negative numbers is to use a single bit (which can be 1 or 0) to represent the sign of a number (+ or -), since both a bit and a sign can have two states. The simplest way to do this is the sign and magnitude convention for storing negative numbers, where the first bit represents the sign, and all subsequent bits represent magnitude.

Consider the set of 3-bit binary numbers, ranging from 000 to 111. If we consider the leading 0 to represent + and the leading 1 to represent -, and the remaining bits to represent magnitude, then 001 =  1, 101 = -1, and 110 = -2. But how did we choose which negative numbers to assign to which bit strings? We chose the negative numbers with the same absolute value as the positive numbers with the same bit string (disregarding the sign bit). Remember, though, that the absolute value relation is NOT arithmetic invariant! Indded, if we perform arithmetic by pencil and paper as we do with decimal numbers, 001 + 101 = 101, which means 1 + -1 = -2. This doesn't make any sense. To make matters worse, not only is arithmetic with the sign & magnitude convention not intuitive on paper, it also requires more complex circuitry for the same reasons. This is one of the major reasons that the sign & magnitude convention is almost never used in modern computer systems.

Ones' Complement 

In ones' complement, the first bit still indicates sign, but the trailing bits indicate magnitude only for positive numbers. For negative numbers, where the first bit is 1, the magnitude of the number is determined by flipping the trailing bits. For example, 101 has a magnitude of flip(01) = 10 = 2. Therefore, 101 = -2. Similarly, 110 has a magnitude of 01 = 1. Thus, even though 001 + 101 = 110, this means that -2 + 1 = -1 in ones' complement, which makes sense.

Why does this addition work so naturally in ones' complement? It's because ones' complement secretly relies on a modular congruence relation. Every negative number in ones' complement and the positive number that is represented by the same trailing bit string are in the same congruence class modulo x, where x equals (2 raised to the number of trailing bits) minus 1. With three bits, this magic number is 3. Notice that 1 % 3 = 1 = -2 % 3.

The main reason ones' complement is not the standard for representing negative numbers is that it allows for a representation of negative zero (111). This is unfortunate because checking if a value is equal to zero is an extremely common operation in computer programming, and in ones' complement systems this operation is slightly inefficient because one has to perform two checks instead of one, since there are two binary strings that represent 0: 000 and 111 (in our imaginary 3-bit system).

Two's Complement

Two's complement is extremely similar to ones' complement. The difference is that obtaining the magnitude of a negative number can be done by flipping the bits and then adding one, instead of just flipping the bits. This modification is done to ensure that even the most positive negative number (111) is equal to -1 instead of 0, ensuring that there is only 1 representation of 0. As a result of this, it is actually possible to store more negative numbers than positive numbers.

Two's complement also secretly relies on a modular congruence relation, the relation modulo (2 raised to the number of bits). In our 3 bit system, there are 2 trailing bits, so negative numbers and positive numbers represented by the same bit string are equivalent modulo 4. For example, 111 = -1, and 011 = 3; -1 % 4 = 3.

Knowing the underlying modular congruence relation can be useful in some fringe cases. The trick of flipping the bits and adding one doesn't work in every case. For example, if you are working with non-standard information formats, such as 24-bit integers, you may run into problems. I once had to work with a device that transmitted information stored in 24-bit integers at regular intervals. I was reading this information into a computer using a USB cable.

Unfortunately, the programming environment I was using on the computer required standard sized integers. Therefore, I had to convert the 24-bit integers into 32-bit integers. To my chagrin, all the integers were being interpreted as positive values, even though the data was actually signed voltage information.   As a result, negative numbers were being interpreted as humongous positive numbers that were way off of the scale of the graph I was trying to display.

I tried casting the data explicitly to a signed integer, but that didn't work, because by left-padding the 24-bit integers with a byte of zeros to turn them into 32-bit integers, I had made all integers look positive to the computer! Whether I cast them as signed or unsigned, they represented the same positive value because of the leading zero. Next, I tried flipping the bits and adding 1, but that also flipped the padding zeros in the leading byte into 1s while turning the last 24 bytes into a small positive number, resulting in a humongous negative number instead of the small positive number storing the magnitude of the negative number, which was what I was trying to get at.

Finally, I ended up subtracting the incoming data by 2 raised to 24, which gave me exactly the values that I needed, because the interpreter was smart enough to realize that the result should be unsigned. This worked because adding or subtracting by a particular number produces a result which has the same remainder when divided by that number. For example, -1 % 4 = 3, and -1 + 4 = 3; 7 % 4 = 3, and 7 - 4 = 3. Do you see the pattern?

Decimal Complement

Another advantage of understanding how two's complement works from a set theoretical perspective is that we can extend the concept to arbitrary number systems, such as a hypothetical decimal system in which the functional units of our computer were somehow able to store 10 distinct states. In this case, we have 10 symbols, ranging in value from 0 to 9. Consider a 2-digit decimal string. There are 100 possible values we could store, ranging from 00 to 99.

If we want to store negative numbers, we could arbitrarily decide that any number greater than or equal to 50 will actually represent a negative number. That is, if the first digit is 5,6,7,8, or 9, this represents negative sign; if the first bit is 0,1,2,3, or 4, this represents positive sign. But which positive number will, for example, 55 represent? By analogy with 2's complement, we can just subtract from 100. 100 - 55 = 45, so 55 represents -45. Another way we can think about this is by "flipping" the digits across some imaginary threshold between 4 and 5 and then adding one. Thus 55 flipped becomes 44, and 44 + 1 = 45, so 55 = -45.

Notice that in this system, because there are 10 digits, the leading digit is able to simultaneously code for sign and magnitude! Isn't that cool? Such is the power of set theory and modular arithmetic.

No comments:

Post a Comment