/ computer science

Floating-Point Arithmetic in Computer Science

Representation in Bits

A floating point number is called floating point, because its point can float while expressing the number using scientific notation.

0.1 can be expressed as 100 x 10-3. The international IEEE 754 floating point standard defines a way to store a floating poing number in the binary system.


In single precision 23 bits are used to store the mantissa, 8 for the power of 2 and one for the sign (pos/neg number). In double precision 64 bits are used in total, called 53 bit precision.

The mantissa includes the digit to the left of the floating point and those on its right. The digit on the left will always be one (otherwise the point would be moved). A 32 bit representation of a decimal number (base 10) is called 24 bit precision, because the normal 23 bits are used and 1 bit for the digit to the left of the floating point.


Since a floating point number can not be represented 1:1 in a binary system, there will be a rounding error, something Computer Scientists measure in Ulps - units in the last place, varying by a factor of β. In case of the simple example of adding 0.1 and 0.2 together the rounding error would be $$ .00000000000000004/0.3 \approx 1 1/3 10^-16 $$

Taking into account β, the relative error would be 1.0 ϵ.

In order to avoid such errors, Guard Digits were introduced. However this would go beyond the scope of this article. I want to focus on the question how to mitigate rounding errors in applied programming.


Almost every programming language has a floating-point data type build in. Python 3 for example converts any integer you enter to a floating point numeric data type by default.

Try out the following in Python (v2 or 3)

>>> 0.1 + 0.2


When your code is interpreted, a floating point number, such as 0.1 will get rounded. This results in small rounding errors even before any actual calculation takes place.

Decimals cannot accurately be represented by the binary system, programming languages are rounding it as good as possible, usually the 17th decimal place.

A few numbers add up correctly, because the rounding errors cancel each other out.

Decimal Module

If you want to perform reliable arithmetic operations, you would have to use another data type than float. In Python you can do so by importing this module:

from decimal import *

Set the floating point precision:

getcontext().prec = 6

Now try doing math with the Decimal module:

Decimal(0.1) + Decimal(0.2)

This is also called fixed-point arithmetic, since if you would try a larger precision you would get a significant larger mantissa that does not get rounded to the 17th place of the fractional part:

>>> getcontext().prec = 30

>>> Decimal(0.1)+Decimal(0.2)


Also try to hand over your decimal as a string to the constructor of the Decimal class and you can get rid once and for all of the rounding issue.

>>> Decimal('0.1') + Decimal('0.2')


In above example a floating point is converted to a string before a decimal gets created by the Decimal class. Now you can deal with exactly the number of digits you had in mind.

This is pointed out in greater detail at Doug Hellmann's Python Module of the Week. It also may be wise to consult the official Python documentation.

Floating-Point Arithmetic in Computer Science
Share this