I have benefitted greatly from multiple readings of *Elements of Programming* by Alexander Stepanov and Paul McJones as well as *From Mathematics to Generic Programming* by Stepanov and Daniel Rose. Each time I read either work, I learn something new.

### Multiply Like An Egyptian

One algorithm that receives extensive coverage in both books is the Egyptian multiplication algorithm. Here’s a simple, unoptimized version of that algorithm from *Elements of Programming*:

template<typename I, typename Op> requires(Integer(I) && BinaryOperation(Op)) Domain(Op) power_0(Domain(Op) a, I n, Op op) { // Precondition: associative(op) ^ n > 0 if (n == I(1)) return a; if (n % I(2) == I(0)) return power_0(op(a, a), n / I(2), op); return op(power_0(op(a, a), n / I(2), op), a); }

If you use `plus<int>`

as the binary operation, the algorithm will perform multiplication (n x a) in O(log_{2}(n)) steps. If you take the exact same code, but use `multiplies<int>`

as the binary operation, the algorithm will perform the operation of raising to a power (a^{n}), again in O(log_{2}(n)) steps. You’re not stuck with integers, either. The algorithm will work with any type that has an associative binary operation (a.k.a. a semigroup). Examples of semigroups include real numbers and polynomials.

This is one of the fundamental themes of Stepanov’s work. Take an algorithm that does something useful in a specific context, figure out the minimal requirements for the types used by that algorithm, and then abstract or expand the algorithm to do something useful in many broader contexts.

### Divide Like An Egyptian

The Egyptian multiplication/power algorithm works by breaking down the big problem into smaller steps. At each step, the value of `a`

is doubled and the value of `n`

is halved. The Egyptians used the same idea in reverse to perform division with remainder. Again, here’s a simple, unoptimized version of that algorithm from *Elements of Programming*:

template<typename T> requires(ArchimedeanMonoid(T)) pair<QuotientType(T), T> quotient_remainder_nonnegative(T a, T b) { // Precondition: a >= 0 ^ b > 0 typedef QuotientType(T) N; if (a < b) return pair<N, T>(N(0), a); if (a - b < b) return pair<N, T>(N(1), a - b); pair<N, T> q = quotient_remainder_nonnegative(a, b + b); N m = twice(q.m0); a = q.m1; if (a < b) return pair<N, T>(m, a); else return pair<N, T>(successor(m), a - b); }

The use of addition and subtraction in the algorithm presented above intrigued me. For the Egyptian multiplication algorithm, we started out using addition to yield multiplication. When we plugged in multiplication instead, we got powers. What happens when we replace addition and subtraction in the Egyptian division algorithm with multiplication and division? For the mathematically astute (or those who read the headline), the answer is *logarithms*!

Here’s the same algorithm tweaked by me to allow for different operations to be used in place of the originally hardcoded addition and subtraction:

template<typename Op> requires(HomogeneousFunction(Op) && Arity(Op) == 2 && ArchimedeanGroup(Domain(Op)) && Codomain(Op) == Domain(Op)) pair<QuotientType(Domain(Op)), Domain(Op)> quotient_remainder_nonnegative_0(Domain(Op) a, Domain(Op) b, Op op) { // Precondition: a >= 0 ^ b > 0 typedef Domain(Op) T; typedef QuotientType(T) N; if (a < b) return pair<N, T>(N(0), a); if (inverse(op)(a, b) < b) return pair<N, T>(N(1), inverse(op)(a, b)); pair<N, T> q = quotient_remainder_nonnegative_0(a, op(b, b), op); N m = twice(q.m0); a = q.m1; if (a < b) return pair<N, T>(m, a); else return pair<N, T>(successor(m), inverse(op)(a, b)); }

### Inverse Operations

In *From Mathematics to Generic Programming*, Stepanov and Rose introduce the `inverse_operation`

function, where the `inverse_operation`

of addition is unary negation and the `inverse_operation`

of multiplication is taking the reciprocal. Thus, performing subtraction (c = a – b) using their notation looks like

T c = op(a, inverse_operation(op)(b));

Instead, I introduce the `inverse`

function, where the `inverse`

of addition is subtraction and the `inverse`

of multiplication is division. Performing subtraction using my notation looks like

T c = inverse(op)(a, b);

I believe that the two notations are functionally equivalent. I chose my notation for the sake of simplicity and performance. I found that performing a single division is faster than taking a reciprocal and multiplying.

### What’s a Remainder?

If I call `quotient_remainder_nonnegative_0(83, 3, multiplies<long long>())`

, I get a quotient (logarithm, really) of 4 and a remainder of 1. That seems about right. 3^{4} = 81 which is the power of 3 closest to 83 without going over. But what does a remainder mean when taking logarithms?

Let’s go back to the quotients and remainders we learned in grade school. 83 / 3 = 27 remainder 2 means that we have to multiply 27 by 3 to get 81 and add 2 to get 83. Similarly, log_{3}(83) = 4 remainder 83/81 means we have to raise 3 to the 4^{th} power to get 81 and *multiply* by 83/81 to get 83.

But we didn’t get a remainder of 83/81. We got a remainder of 1 instead. What’s wrong? If we look at the requirements for calling `quotient_remainder_nonnegative_0`

, we note that the domain of `Op`

should be an Archimedean Group. A group has an associative operation, an identity element, and an inverse operation. In our case, the associative operation is multiplication of `long long`

integers, the identity element is 1, and the inverse operation is division of `long long`

integers. However, integers don’t form a group under multiplication and division, because dividing one integer by another often leads to a non-integer (fractional) result. This is another way of saying that integers are not *closed* under division. Unfortunately, the requirements on `quotient_remainder_nonnegative_0`

are currently only documentation for the user. Current C++ compilers have no way of enforcing them.

If we want meaningful remainder results, we’ll have to use a type that *is* closed under division and *does* form a group. Rational numbers meet this criteria. I chose a rational implementation from the *boost* c++ libraries. Calling `quotient_remainder_nonnegative_0(83, 3, multiplies<boost::rational<long long>>)`

does yield the expected results of 4 and 83/81.

### Simple Optimizations

The first optimization that I made to the above algorithm is to remember the result of `inverse(op)(a, b)`

and reuse it rather than recalculate it.

template<typename Op> requires(HomogeneousFunction(Op) && Arity(Op) == 2 && ArchimedeanGroup(Domain(Op)) && Codomain(Op) == Domain(Op)) pair<QuotientType(Domain(Op)), Domain(Op)> quotient_remainder_nonnegative_1(Domain(Op) a, Domain(Op) b, Op op) { // Precondition: a >= 0 ^ b > 0 typedef Domain(Op) T; typedef QuotientType(T) N; if (a < b) return pair<N, T>(N(0), a); T const c = inverse(op)(a, b); if (c < b) return pair<N, T>(N(1), c); pair<N, T> q = quotient_remainder_nonnegative_1(a, op(b, b), op); N m = twice(q.m0); a = q.m1; if (a < b) return pair<N, T>(m, a); else return pair<N, T>(successor(m), inverse(op)(a, b)); }

In the original `quotient_remainder_nonnegative`

algorithm, Stepanov and McJones deliberately chose to test `if(a - b < b)`

rather than `if(a < b + b)`

to avoid overflow. If we ignore overflow and assume that `op(a, b)`

is faster than `inverse(op)(a, b)`

, it leads us to another optimization:

template<typename Op> requires(HomogeneousFunction(Op) && Arity(Op) == 2 && ArchimedeanGroup(Domain(Op)) && Codomain(Op) == Domain(Op)) pair<QuotientType(Domain(Op)), Domain(Op)> quotient_remainder_nonnegative_2(Domain(Op) a, Domain(Op) b, Op op) { // Precondition: a >= 0 ^ b > 0 typedef Domain(Op) T; typedef QuotientType(T) N; if (a < b) return pair<N, T>(N(0), a); T const c = op(b, b); if (a < c) return pair<N, T>(N(1), inverse(op)(a, b)); pair<N, T> q = quotient_remainder_nonnegative_2(a, c, op); N m = twice(q.m0); a = q.m1; if (a < b) return pair<N, T>(m, a); else return pair<N, T>(successor(m), inverse(op)(a, b)); }

### More Advanced Optimization Attempts

What we have so far is an algorithm that works in O(log_{2}(n)) time and uses O(log_{2}(n)) space. Can we do better in terms of space or time? In *Elements of Programming*, Stepanov and McJones provide the algorithm `quotient_remainder_nonnegative_iterative`

that calculates quotients and remainders in O(log_{2}(n)) time using O(1) space. However, this algorithm requires `T`

to be a `HalvableMonoid`

. What does “half” mean in the context of integers and multiplication? a = b * b ⇒ half(a) = b. In other words, “halving” with respect to multiplication means taking the square root. We don’t have a way to accomplish that easily other than by taking the logarithm and dividing by two, so we’re back where we started.

### Fibonacci to the Rescue?

On the *Elements of Programming* web site, Stepanov and McJones provide an algorithm for computing remainders (attributed to Floyd and Knuth) entitled `remainder_nonnegative_fibonacci`

. The same algorithm appears as `remainder_fibonacci`

in *From Mathematics to Generic Programming*. The algorithm has O(log_{2}(n)) time complexity and O(1) space complexity. I have extended the algorithm to provide both quotients and remainders:

template<typename Op> std::pair<Domain(Op), Domain(Op)> next_fibonacci(Domain(Op) x, Domain(Op) y, Op op) { return std::pair<Domain(Op), Domain(Op)>(y, op(x, y)); } template<typename Op> std::pair<Domain(Op), Domain(Op)> previous_fibonacci(Domain(Op) x, Domain(Op) y, Op op) { return std::pair<Domain(Op), Domain(Op)>(inverse(op)(y, x), x); } template<typename Op> requires(HomogeneousFunction(Op) && Arity(Op) == 2 && ArchimedeanGroup(Domain(Op)) && Codomain(Op) == Domain(Op)) pair<QuotientType(Domain(Op)), Domain(Op)> quotient_remainder_nonnegative_fibonacci(Domain(Op) a, Domain(Op) b, Op op) { // Precondition: a >= 0 ^ b > 0 typedef Domain(Op) T; typedef QuotientType(T) N; N n = N(0); if (a < b) return pair<N, T>(n, a); T c = b; N d = N(1), e = N(1); plus<N> integer_op; do { std::tie(b, c) = next_fibonacci(b, c, op); std::tie(d, e) = next_fibonacci(d, e, integer_op); } while (a >= c); do { if (a >= b) { a = inverse(op)(a, b); n = integer_op(n, d); } std::tie(b, c) = previous_fibonacci(b, c, op); std::tie(d, e) = previous_fibonacci(d, e, integer_op); } while (b < c); return pair<N, T>(n, a); }

Although it has the same time complexity and better space complexity than the `quotient_remainder_nonnegative`

algorithms, `quotient_remainder_nonnegative_fibonacci`

fared relatively poorly in my benchmarks. This is not surprising given that Stepanov and McJones indicate that this algorithm performs about 31% more operations than the `quotient_remainder_nonnegative`

algorithms.

### Jon Brandt to the Rescue?

There’s another fixed space algorithm hiding not in the pages of *Elements of Programming* but in the source code for that book. The algorithm is `remainder_nonnegative_with_largest_doubling`

attributed to Jon Brandt. I extended it to compute quotients and to take an operation according to the standard pattern I’ve followed thus far:

template<typename Op> requires(HomogeneousFunction(Op) && Arity(Op) == 2 && ArchimedeanGroup(Domain(Op)) && Codomain(Op) == Domain(Op)) pair<QuotientType(Domain(Op)), Domain(Op)> largest_doubling(Domain(Op) a, Domain(Op) b, Op op) { // Precondition: a >= b > 0 typedef Domain(Op) T; typedef QuotientType(T) N; N m = N(1); T c = op(b, b); while (c <= a) { m = twice(m); b = c; c = op(c, c); } return pair<N, T>(m, b); } template<typename Op> requires(HomogeneousFunction(Op) && Arity(Op) == 2 && ArchimedeanGroup(Domain(Op)) && Codomain(Op) == Domain(Op)) pair<QuotientType(Domain(Op)), Domain(Op)> quotient_remainder_nonnegative_with_largest_doubling(Domain(Op) a, Domain(Op) b, Op op) { // Precondition: a >= T(0) ^ b > T(0) typedef Domain(Op) T; typedef QuotientType(T) N; N quotient = N(0); while (b <= a) { pair<N, T> const m = largest_doubling(a, b, op); quotient += m.m0; a = inverse(op)(a, m.m1); } return pair<N, T>(quotient, a); }

`quotient_remainder_nonnegative_with_largest_doubling`

consistently placed second in all of my benchmarks.

### Prior Art

I haven’t been able to locate any references that deal with extending quotient and remainder algorithms by abstracting the operation type to calculate logarithms. This may be due to poor research skills on my part, the sheer obviousness of the conclusions I’ve drawn, or, hopefully, the novelty of my solution. Please let me know in the comments below.

The one approach I found that is very similar to mine is one taken by the code for calculating integer logarithms in the Scheme Standard Prelude. It is described in a Programming Praxis blog entry. They use the doubling technique to figure out which powers of 2 bound the logarithm, and then use a binary search between those two powers to find the answer. Here’s a translation of their Scheme code:

template<typename Op> requires(HomogeneousFunction(Op) && Arity(Op) == 2 && ArchimedeanGroup(Domain(Op)) && Codomain(Op) == Domain(Op)) pair<QuotientType(Domain(Op)), Domain(Op)> ilog(Domain(Op) a, Domain(Op) b, Op op) { typedef Domain(Op) T; typedef QuotientType(T) N; N lo(0), hi(1); T b_lo(identity_element(op)), b_hi(b); while (b_hi < a) { lo = hi; b_lo = b_hi; hi = twice(hi); b_hi = op(b_hi, b_hi); } while (hi - lo > 1) { N const mid = (lo + hi) / 2; T const b_mid = op(power(b, mid - lo, op), b_lo); if (a < b_mid) { hi = mid; b_hi = b_mid; } else if (b_mid < a) { lo = mid; b_lo = b_mid; } else return pair<N, T>(mid, inverse(op)(a, b_mid)); } return b_hi == a ? pair<N, T>(hi, inverse(op)(a, b_hi)) : pair<N, T>(lo, inverse(op)(a, b_lo)); }

This method also fared well in my benchmarks.

### Benchmarks

I’m using the Visual Studio 2013 Compiler with the /O2 flag running on Windows 7.

I tested all of the algorithms using 3 different types for `op`

:

- plus<long long>
- multiplies<long long>
- multiplies<boost::rational<long long>>

For `a`

I used values from 1 to 1,000,000. For `b`

I used the values 2, 3, 5, and 7. I borrowed this testing regimen from a Bonsai Code blog entry.

Setting `op`

to `plus<long long>`

calculates the normal quotient and remainder you remember fondly from elementary school. I benchmarked all of the algorithms and compared them against the builtin C++ operators for division and remainder.

algorithm | time (s) |
---|---|

builtin_quotient_remainder | 0.05 |

quotient_remainder_nonnegative_with_largest_doubling | 0.17 |

quotient_remainder_nonnegative_fibonacci | 0.20 |

quotient_remainder_nonnegative_2 | 0.35 |

quotient_remainder_nonnegative_1 | 0.36 |

quotient_remainder_nonnegative_0 | 0.38 |

ilog | 0.39 |

Suffice it to say that if you’re doing integer division and remainder, use the builtin C++ operators :).

Setting `op`

to `multiplies<long long>`

calculates integer logarithms without returning a meaningful remainder. Sometimes, all you need is a rough estimate of the logarithm.

algorithm | time (s) |
---|---|

ilog | 0.08 |

quotient_remainder_nonnegative_with_largest_doubling | 0.10 |

quotient_remainder_nonnegative_2 | 0.13 |

quotient_remainder_nonnegative_1 | 0.21 |

quotient_remainder_nonnegative_0 | 0.22 |

quotient_remainder_nonnegative_fibonacci | 0.27 |

Setting `op`

to `multiplies<boost::rational<long long>>`

calculates integer logarithms with rational remainders. This provides the most information, but rational arithmetic is slow compared to integer arithmetic.

algorithm | time (s) |
---|---|

quotient_remainder_nonnegative_2 | 2.99 |

quotient_remainder_nonnegative_with_largest_doubling | 3.09 |

ilog | 3.35 |

quotient_remainder_nonnegative_1 | 4.26 |

quotient_remainder_nonnegative_fibonacci | 4.42 |

quotient_remainder_nonnegative_0 | 4.98 |

### Conclusions

The fundamental insight I’m trying to convey is that it is possible to extend the existing Egyptian division algorithm to use operations other than addition and subtraction and still yield useful results. Substituting multiplication and division, for example, leads to logarithms and remainders.

There are several variations on the algorithms that have differing performance characteristics depending upon the types and operations in use. Try several to see which combination works best for your application.

Everything I know about abstract algebra, I’ve learned from videos and books by Alex Stepanov and his colleagues. I’m grateful to them for what I’ve learned, but I have much more to learn. There are surely errors in terminology, notation, and code above. Please gently point out my mistakes in the comments below or drop me an email.

Here’s the complete set of code and benchmarks in a single translation unit. The first part of the file is taken directly from header files on the *Elements of Programming* site. That code is copyrighted by Alexander Stepanov and Paul McJones. I kept only what I needed for these algorithms to keep the file short. You should use the real header files from the site because they contain many more useful algorithms. The rest of the file is copyrighted by me, David Sanders.

nice article. One minor nit though:

“The algorithm will work with any type that has an associative binary operation (a.k.a. a noncommutative additive semigroup). Examples of noncommutative additive semigroups include real numbers and polynomials.”

I don’t think commutativity is relevant here. It will work with commutative groups as well. An real numbers and polynomials ARE commutative.

LikeLike

Thanks for reading and commenting. Your assertions are correct that the algorithms don’t require commutativity and that polynomials and reals are commutative. The confusion is one of terminology. Here’s a relevant passage from chapter 7 of “From Mathematics to Generic Programming” by Stepanov and Rose: “Since we don’t need commutativity for our algorithm, we’ll say that A is a noncommutative additive semigroup. This means that commutativity is not required; it does not mean that commutativity is not allowed. In other words, every (commutative) additive semigroup is also a noncommutative additive semigroup.” You could easily convince me that “noncommutative additive semigroup” is a poor name for such an entity, but I wanted to be consistent with their terminology.

LikeLike

I didn’t know the quote from the book. And I don’t think it’s a great idea for him to redefine accepted mathematical terms in cases such as this. BTW it turns out that the terms semigroup, etc are defined differently by different authors and the “accepted” definition seems to evolve over time. All this adds to the confusion for “the rest of us”

At the risk of hijacking the thread. I’d like to take the opportunity to plug my session “C++, Abstract Algebra, and Applications at CPPCon 2016 this September. I’m hoping that those of us interested in this stuff (i.e. both of us) will find my take on this interesting.

LikeLike

Very interesting! I wonder if Alexander Stepanov know about this – I think you should write a letter to him.

I checked Knuth TAOCP – V.1, 1.2.2, Ex.25 – there is note about similarity of division and logarithm algorithms:

“This method [logarithm approximation] is very similar to the method used for division in computer hardware”

LikeLike

Thank you. I sent the link to Alex, Dan, and Paul. An email exchange with Dan and Alex several months ago inspired me to write this post. I’ll check my copy of Knuth tomorrow when I get to work.

LikeLike

Having received essentially the same comment from two people I respect (Robert Ramey and Paul McJones), I updated the post to change “noncommutative additive semigroup” to just “semigroup”. Robert, I’m looking forward to your CPPCon session. I won’t be in attendance, but I typically watch the videos after the fact.

LikeLike

Very nice article!

LikeLike