Second Generation Programmer

I have benefitted greatly from multiple readings of Elements of Programming by Alexander Stepanov and Paul McJones as well as From Mathematics to Generic Programming by Stepanov and Daniel Rose. Each time I read either work, I learn something new.

Multiply Like An Egyptian

One algorithm that receives extensive coverage in both books is the Egyptian multiplication algorithm. Here’s a simple, unoptimized version of that algorithm from Elements of Programming:

template<typename I, typename Op>
    requires(Integer(I) && BinaryOperation(Op))
Domain(Op) power_0(Domain(Op) a, I n, Op op)
{
    // Precondition: associative(op) ^ n > 0
    if (n == I(1)) return a;
    if (n % I(2) == I(0))
    return power_0(op(a, a), n / I(2), op);
    return op(power_0(op(a, a), n / I(2), op), a);
}

If you use plus<int> as the binary operation, the algorithm will perform multiplication (n x a) in O(log₂(n)) steps. If you take the exact same code, but use multiplies<int> as the binary operation, the algorithm will perform the operation of raising to a power (aⁿ), again in O(log₂(n)) steps. You’re not stuck with integers, either. The algorithm will work with any type that has an associative binary operation (a.k.a. a semigroup). Examples of semigroups include real numbers and polynomials.

This is one of the fundamental themes of Stepanov’s work. Take an algorithm that does something useful in a specific context, figure out the minimal requirements for the types used by that algorithm, and then abstract or expand the algorithm to do something useful in many broader contexts.

Divide Like An Egyptian

The Egyptian multiplication/power algorithm works by breaking down the big problem into smaller steps. At each step, the value of a is doubled and the value of n is halved. The Egyptians used the same idea in reverse to perform division with remainder. Again, here’s a simple, unoptimized version of that algorithm from Elements of Programming:

template<typename T>
    requires(ArchimedeanMonoid(T))
pair<QuotientType(T), T>
quotient_remainder_nonnegative(T a, T b)
{
    // Precondition: a >= 0 ^ b > 0
    typedef QuotientType(T) N;
    if (a < b) return pair<N, T>(N(0), a);
    if (a - b < b) return pair<N, T>(N(1), a - b);
    pair<N, T> q = quotient_remainder_nonnegative(a, b + b);
    N m = twice(q.m0);
    a = q.m1;
    if (a < b) return pair<N, T>(m, a);
    else       return pair<N, T>(successor(m), a - b);
}

The use of addition and subtraction in the algorithm presented above intrigued me. For the Egyptian multiplication algorithm, we started out using addition to yield multiplication. When we plugged in multiplication instead, we got powers. What happens when we replace addition and subtraction in the Egyptian division algorithm with multiplication and division? For the mathematically astute (or those who read the headline), the answer is logarithms!

Here’s the same algorithm tweaked by me to allow for different operations to be used in place of the originally hardcoded addition and subtraction:

template<typename Op>
    requires(HomogeneousFunction(Op) && Arity(Op) == 2 &&
        ArchimedeanGroup(Domain(Op)) &&
        Codomain(Op) == Domain(Op))
pair<QuotientType(Domain(Op)), Domain(Op)>
quotient_remainder_nonnegative_0(Domain(Op) a, Domain(Op) b, Op op)
{
    // Precondition: a >= 0 ^ b > 0
    typedef Domain(Op) T;
    typedef QuotientType(T) N;
    if (a < b) return pair<N, T>(N(0), a);
    if (inverse(op)(a, b) < b) return pair<N, T>(N(1), inverse(op)(a, b));
    pair<N, T> q = quotient_remainder_nonnegative_0(a, op(b, b), op);
    N m = twice(q.m0);
    a = q.m1;
    if (a < b) return pair<N, T>(m, a);
    else       return pair<N, T>(successor(m), inverse(op)(a, b));
}

Inverse Operations

In From Mathematics to Generic Programming, Stepanov and Rose introduce the inverse_operation function, where the inverse_operation of addition is unary negation and the inverse_operation of multiplication is taking the reciprocal. Thus, performing subtraction (c = a – b) using their notation looks like

T c = op(a, inverse_operation(op)(b));

Instead, I introduce the inverse function, where the inverse of addition is subtraction and the inverse of multiplication is division. Performing subtraction using my notation looks like

T c = inverse(op)(a, b);

I believe that the two notations are functionally equivalent. I chose my notation for the sake of simplicity and performance. I found that performing a single division is faster than taking a reciprocal and multiplying.

What’s a Remainder?

If I call quotient_remainder_nonnegative_0(83, 3, multiplies<long long>()), I get a quotient (logarithm, really) of 4 and a remainder of 1. That seems about right. 3⁴ = 81 which is the power of 3 closest to 83 without going over. But what does a remainder mean when taking logarithms?

Let’s go back to the quotients and remainders we learned in grade school. 83 / 3 = 27 remainder 2 means that we have to multiply 27 by 3 to get 81 and add 2 to get 83. Similarly, log₃(83) = 4 remainder 83/81 means we have to raise 3 to the 4^th power to get 81 and multiply by 83/81 to get 83.

But we didn’t get a remainder of 83/81. We got a remainder of 1 instead. What’s wrong? If we look at the requirements for calling quotient_remainder_nonnegative_0, we note that the domain of Op should be an Archimedean Group. A group has an associative operation, an identity element, and an inverse operation. In our case, the associative operation is multiplication of long long integers, the identity element is 1, and the inverse operation is division of long long integers. However, integers don’t form a group under multiplication and division, because dividing one integer by another often leads to a non-integer (fractional) result. This is another way of saying that integers are not closed under division. Unfortunately, the requirements on quotient_remainder_nonnegative_0 are currently only documentation for the user. Current C++ compilers have no way of enforcing them.

If we want meaningful remainder results, we’ll have to use a type that is closed under division and does form a group. Rational numbers meet this criteria. I chose a rational implementation from the boost c++ libraries. Calling quotient_remainder_nonnegative_0(83, 3, multiplies<boost::rational<long long>>) does yield the expected results of 4 and 83/81.

Simple Optimizations

The first optimization that I made to the above algorithm is to remember the result of inverse(op)(a, b) and reuse it rather than recalculate it.

template<typename Op>
    requires(HomogeneousFunction(Op) && Arity(Op) == 2 &&
        ArchimedeanGroup(Domain(Op)) &&
        Codomain(Op) == Domain(Op))
pair<QuotientType(Domain(Op)), Domain(Op)>
quotient_remainder_nonnegative_1(Domain(Op) a, Domain(Op) b, Op op)
{
    // Precondition: a >= 0 ^ b > 0
    typedef Domain(Op) T;
    typedef QuotientType(T) N;
    if (a < b) return pair<N, T>(N(0), a);
    T const c = inverse(op)(a, b);
    if (c < b) return pair<N, T>(N(1), c);
    pair<N, T> q = quotient_remainder_nonnegative_1(a, op(b, b), op);
    N m = twice(q.m0);
    a = q.m1;
    if (a < b) return pair<N, T>(m, a);
    else       return pair<N, T>(successor(m), inverse(op)(a, b));
}

In the original quotient_remainder_nonnegative algorithm, Stepanov and McJones deliberately chose to test if(a - b < b) rather than if(a < b + b) to avoid overflow. If we ignore overflow and assume that op(a, b) is faster than inverse(op)(a, b), it leads us to another optimization:

template<typename Op>
    requires(HomogeneousFunction(Op) && Arity(Op) == 2 &&
        ArchimedeanGroup(Domain(Op)) &&
        Codomain(Op) == Domain(Op))
pair<QuotientType(Domain(Op)), Domain(Op)>
quotient_remainder_nonnegative_2(Domain(Op) a, Domain(Op) b, Op op)
{
    // Precondition: a >= 0 ^ b > 0
    typedef Domain(Op) T;
    typedef QuotientType(T) N;
    if (a < b) return pair<N, T>(N(0), a);
    T const c = op(b, b);
    if (a < c) return pair<N, T>(N(1), inverse(op)(a, b));
    pair<N, T> q = quotient_remainder_nonnegative_2(a, c, op);
    N m = twice(q.m0);
    a = q.m1;
    if (a < b) return pair<N, T>(m, a);
    else       return pair<N, T>(successor(m), inverse(op)(a, b));
}

More Advanced Optimization Attempts

What we have so far is an algorithm that works in O(log₂(n)) time and uses O(log₂(n)) space. Can we do better in terms of space or time? In Elements of Programming, Stepanov and McJones provide the algorithm quotient_remainder_nonnegative_iterative that calculates quotients and remainders in O(log₂(n)) time using O(1) space. However, this algorithm requires T to be a HalvableMonoid. What does “half” mean in the context of integers and multiplication? a = b * b ⇒ half(a) = b. In other words, “halving” with respect to multiplication means taking the square root. We don’t have a way to accomplish that easily other than by taking the logarithm and dividing by two, so we’re back where we started.

Fibonacci to the Rescue?

On the Elements of Programming web site, Stepanov and McJones provide an algorithm for computing remainders (attributed to Floyd and Knuth) entitled remainder_nonnegative_fibonacci. The same algorithm appears as remainder_fibonacci in From Mathematics to Generic Programming. The algorithm has O(log₂(n)) time complexity and O(1) space complexity. I have extended the algorithm to provide both quotients and remainders:

template<typename Op>
std::pair<Domain(Op), Domain(Op)> next_fibonacci(Domain(Op) x, Domain(Op) y, Op op)
{
    return std::pair<Domain(Op), Domain(Op)>(y, op(x, y));
}

template<typename Op>
std::pair<Domain(Op), Domain(Op)> previous_fibonacci(Domain(Op) x, Domain(Op) y, Op op) {
    return std::pair<Domain(Op), Domain(Op)>(inverse(op)(y, x), x);
}

template<typename Op>
    requires(HomogeneousFunction(Op) && Arity(Op) == 2 &&
        ArchimedeanGroup(Domain(Op)) &&
        Codomain(Op) == Domain(Op))
pair<QuotientType(Domain(Op)), Domain(Op)>
quotient_remainder_nonnegative_fibonacci(Domain(Op) a, Domain(Op) b, Op op)
{
    // Precondition: a >= 0 ^ b > 0
    typedef Domain(Op) T;
    typedef QuotientType(T) N;
    N n = N(0);
    if (a < b) return pair<N, T>(n, a);
    T c = b;
    N d = N(1), e = N(1);
    plus<N> integer_op;
    do {
        std::tie(b, c) = next_fibonacci(b, c, op);
        std::tie(d, e) = next_fibonacci(d, e, integer_op);
    } while (a >= c);
    do {
        if (a >= b)
        {
            a = inverse(op)(a, b);
            n = integer_op(n, d);
        }
        std::tie(b, c) = previous_fibonacci(b, c, op);
        std::tie(d, e) = previous_fibonacci(d, e, integer_op);
    } while (b < c);
    return pair<N, T>(n, a);
}

Although it has the same time complexity and better space complexity than the quotient_remainder_nonnegative algorithms, quotient_remainder_nonnegative_fibonacci fared relatively poorly in my benchmarks. This is not surprising given that Stepanov and McJones indicate that this algorithm performs about 31% more operations than the quotient_remainder_nonnegative algorithms.

Jon Brandt to the Rescue?

There’s another fixed space algorithm hiding not in the pages of Elements of Programming but in the source code for that book. The algorithm is remainder_nonnegative_with_largest_doubling attributed to Jon Brandt. I extended it to compute quotients and to take an operation according to the standard pattern I’ve followed thus far:

template<typename Op>
    requires(HomogeneousFunction(Op) && Arity(Op) == 2 &&
        ArchimedeanGroup(Domain(Op)) &&
        Codomain(Op) == Domain(Op))
pair<QuotientType(Domain(Op)), Domain(Op)>
largest_doubling(Domain(Op) a, Domain(Op) b, Op op)
{
    // Precondition: a >= b > 0
    typedef Domain(Op) T;
    typedef QuotientType(T) N;
    N m = N(1);
    T c = op(b, b);
    while (c <= a)
    {
        m = twice(m);
        b = c;
        c = op(c, c);
    }
    return pair<N, T>(m, b);
}

template<typename Op>
    requires(HomogeneousFunction(Op) && Arity(Op) == 2 &&
        ArchimedeanGroup(Domain(Op)) &&
        Codomain(Op) == Domain(Op))
pair<QuotientType(Domain(Op)), Domain(Op)>
quotient_remainder_nonnegative_with_largest_doubling(Domain(Op) a, Domain(Op) b, Op op)
{
    // Precondition: a >= T(0) ^ b > T(0)
    typedef Domain(Op) T;
    typedef QuotientType(T) N;
    N quotient = N(0);
    while (b <= a)
    {
        pair<N, T> const m = largest_doubling(a, b, op);
        quotient += m.m0;
        a = inverse(op)(a, m.m1);
    }
    return pair<N, T>(quotient, a);
}

quotient_remainder_nonnegative_with_largest_doubling consistently placed second in all of my benchmarks.

Prior Art

I haven’t been able to locate any references that deal with extending quotient and remainder algorithms by abstracting the operation type to calculate logarithms. This may be due to poor research skills on my part, the sheer obviousness of the conclusions I’ve drawn, or, hopefully, the novelty of my solution. Please let me know in the comments below.

The one approach I found that is very similar to mine is one taken by the code for calculating integer logarithms in the Scheme Standard Prelude. It is described in a Programming Praxis blog entry. They use the doubling technique to figure out which powers of 2 bound the logarithm, and then use a binary search between those two powers to find the answer. Here’s a translation of their Scheme code:

template<typename Op>
    requires(HomogeneousFunction(Op) && Arity(Op) == 2 &&
        ArchimedeanGroup(Domain(Op)) &&
        Codomain(Op) == Domain(Op))
pair<QuotientType(Domain(Op)), Domain(Op)>
ilog(Domain(Op) a, Domain(Op) b, Op op)
{
    typedef Domain(Op) T;
    typedef QuotientType(T) N;
    N lo(0), hi(1);
    T b_lo(identity_element(op)), b_hi(b);
    while (b_hi < a)
    {
        lo = hi;
        b_lo = b_hi;
        hi = twice(hi);
        b_hi = op(b_hi, b_hi);
    }
    while (hi - lo > 1)
    {
        N const mid = (lo + hi) / 2;
        T const b_mid = op(power(b, mid - lo, op), b_lo);
        if (a < b_mid)
        {
            hi = mid;
            b_hi = b_mid;
        }
        else if (b_mid < a)
        {
            lo = mid;
            b_lo = b_mid;
        }
        else
            return pair<N, T>(mid, inverse(op)(a, b_mid));
    }
    return b_hi == a ? pair<N, T>(hi, inverse(op)(a, b_hi)) : pair<N, T>(lo, inverse(op)(a, b_lo));
}

This method also fared well in my benchmarks.

Benchmarks

I’m using the Visual Studio 2013 Compiler with the /O2 flag running on Windows 7.

I tested all of the algorithms using 3 different types for op:

plus<long long>
multiplies<long long>
multiplies<boost::rational<long long>>

For a I used values from 1 to 1,000,000. For b I used the values 2, 3, 5, and 7. I borrowed this testing regimen from a Bonsai Code blog entry.

Setting op to plus<long long> calculates the normal quotient and remainder you remember fondly from elementary school. I benchmarked all of the algorithms and compared them against the builtin C++ operators for division and remainder.

algorithm	time (s)
builtin_quotient_remainder	0.05
quotient_remainder_nonnegative_with_largest_doubling	0.17
quotient_remainder_nonnegative_fibonacci	0.20
quotient_remainder_nonnegative_2	0.35
quotient_remainder_nonnegative_1	0.36
quotient_remainder_nonnegative_0	0.38
ilog	0.39

Suffice it to say that if you’re doing integer division and remainder, use the builtin C++ operators :).

Setting op to multiplies<long long> calculates integer logarithms without returning a meaningful remainder. Sometimes, all you need is a rough estimate of the logarithm.

algorithm	time (s)
ilog	0.08
quotient_remainder_nonnegative_with_largest_doubling	0.10
quotient_remainder_nonnegative_2	0.13
quotient_remainder_nonnegative_1	0.21
quotient_remainder_nonnegative_0	0.22
quotient_remainder_nonnegative_fibonacci	0.27

Setting op to multiplies<boost::rational<long long>> calculates integer logarithms with rational remainders. This provides the most information, but rational arithmetic is slow compared to integer arithmetic.

algorithm	time (s)
quotient_remainder_nonnegative_2	2.99
quotient_remainder_nonnegative_with_largest_doubling	3.09
ilog	3.35
quotient_remainder_nonnegative_1	4.26
quotient_remainder_nonnegative_fibonacci	4.42
quotient_remainder_nonnegative_0	4.98

Conclusions

The fundamental insight I’m trying to convey is that it is possible to extend the existing Egyptian division algorithm to use operations other than addition and subtraction and still yield useful results. Substituting multiplication and division, for example, leads to logarithms and remainders.

There are several variations on the algorithms that have differing performance characteristics depending upon the types and operations in use. Try several to see which combination works best for your application.

Everything I know about abstract algebra, I’ve learned from videos and books by Alex Stepanov and his colleagues. I’m grateful to them for what I’ve learned, but I have much more to learn. There are surely errors in terminology, notation, and code above. Please gently point out my mistakes in the comments below or drop me an email.

Here’s the complete set of code and benchmarks in a single translation unit. The first part of the file is taken directly from header files on the Elements of Programming site. That code is copyrighted by Alexander Stepanov and Paul McJones. I kept only what I needed for these algorithms to keep the file short. You should use the real header files from the site because they contain many more useful algorithms. The rest of the file is copyrighted by me, David Sanders.