Graphical representation of Cartesian product of 2 sets.

Cartesian Product Iterator in C++

Iterators are fundamental concepts for using C++ STL, as they provide access to containers, but they can be extended for more general applications. Possibly the major example of this is the Boost.Iterator library, where many new iterators are defined for very diverse purposes. I’ll show in this post how they are used and how they inspired the creation of the product iterator for C++, which is an iterator that computes the Cartesian product of other iterators, much like itertools.product in Python. The code for the library is available at GitHub.

Boost.Iterator examples

Consider the problem of iterating over a sequence of values, like one we would find when performing Monte Carlo simulations. Surely we could write something like

to create the sequence, but this isn’t good practice since if we changed numbers to be a list, then the code would be broken. That’s why we have the STL: to help us write good, clean code. Moreover, if we just wanted to pass the sequence to some function, then we’d have to fill some memory just to place these numbers.

However, how could we write this using STL-like code? We have no container that lists numbers! That’s where Boost.Iterator comes into place.

One of the iterators defined there is called counting_iterator, which does exactly what it says: creates an iterator that can be used to count from a number to another number. Using it, we can re-write the previous code as shown in the counting_iterator page:

This provides a good, clean, class-agnostic interface, which is exactly what we need when writing maintainable C++ code. If the container will be passed to a function to process, maybe we could just pass the begin and end iterators, which would save us O(N) memory.

Another good example is the problem of traversing two containers at the same time (or one container and one “virtual” container, like the counting example before). Although we could write something like

this code is hard to maintain. Again, Boost provides us with the answer: the zip_iterator, which allows us to iterate over multiple things at the same time.

Python iterators

Someone familiar with  Python probably recognized that it provides these solutions already. In Python, the first example can be solved by just calling range(N) or xrange(N), and the second one is solved by the izip() function in the itertools library. These problems are common enough that almost everyone had to solve them one time or another.

However, I have encountered many times a particular problem that is solved by the itertools but that I couldn’t find a good existing solution for C++: performing a Cartesian product. As a motivator, consider the problem of performing Monte Carlo simulations while testing the combination of two parameter values. In Python, we can write:

and we’d be done. So simple, so elegant. In C++, I had to write something like

which is much harder to write and to keep track of the variables (don’t forget about typos too). Therefore, a C++ Cartesian product iterator was required.

Product iterator in C++

When looking around the web, I found 3 major solutions:

  1. Assuming that all containers have the same type, some hack could be written to iterate over all of them;
  2. Using all containers, a function created a vector of tuples with all combinations of iterators possible;
  3. Some wrapper codes had to be used around the main computation.

The second solution was close to what I wanted, but it required O(M*N^M) space, where M is the number of containers and N is the size of each container. This cost comes from the total N^M combination of tuples, each one with size O(M). This number could easily get out of hand, so it was a deal breaker.

However, remember the example of the counting_iterator. It replaces something that requires O(N) memory with something with O(1) memory, since only the begin and last iterators are required. How does it achieve such reduction? Lazy computation. Basically, instead of performing all the computation required at one point in time, which is the creation of the vector in the example, and then just using the results, the computation is performed at each step, which avoids the memory requirement. This is exactly the problem with solution 2: it computes everything first and then just uses them, which makes it a perfect candidate for lazy computing.

With this in mind, I created the C++ product iterator, which is a very simple class to lazily perform the Cartesian product while requiring O(M) memory and is available on GitHub. As an application example, consider the following code:

This example creates two vectors and iterates over their Cartesian product terms. The object creation is made easier by the function make_product_iterator and the C++11 keyword auto solves the possibly complex type. Moreover, the iterator is built in a way that the last container is traversed first, which is consistent with the order one would expect from a bunch of fors and is the same order returned by the Python iterator.

It’s important to highlight that the values provided by the product iterator are constant references. Although it would be possible to return non-constant references, I think it’s kind of strange to allow the value to be modified, which would cause a latter iteration to get a different value for the same position.

Differences to Boost.Iterator

Although I used the iterator_facade from Boost.iterator, one notices two major differences between this and the iterators defined in Boost.

First, the make_product_iterator receives the full containers, instead of its iterators. If it had received just the begin iterators for the new product begin and the end iterators for the new product end, then it wouldn’t have the end() information at the current iterator. When we consider an iterator such as the zip_iterator, we are able to separate the begin and the end since all the iterators are traversed in the same direction. For the Cartesian product, it first iterates over the last container and, when it reaches the end, it increments the previous container and resets the current one.

If the current iterator didn’t have the information about the end iterators for each container in it, it wouldn’t be able to tell when it were finished with one container and needed to increment another container. One might think “But you have the end() information at the new end iterator!”. True, but we can’t use it directly. The only “communication” between the current and end iterator is in the statement it != end, which should return false only after all combinations were traversed. Hence, the current iterator requires the end() information.

While the function could ask the user to provide the begin and end for each container instead of the container itself, I thought it wouldn’t be a good idea since it would be asking them at the same place (arguments for the same function) and most uses will probably come directly from containers, so we could gather the iterators ourselves instead of relying on the user.

The second difference is that the end iterator comes from a current iterator through the method get_end(). I thought a lot about this, since this is a strange behavior. I didn’t like the idea of creating another auxiliary function just to create the end iterator, since if someone changed the call to make_product_iterator but forgot to change the make_product_end_iterator, there would be a problem. Besides, there would be unnecessarily replicated code. Another possibility would be to return a tuple from make_product_iterator, and the user could use the std::get<>() function to get the desired iterator, which I don’t find practical and is error-prone, or to use std::tie, which would require knowing the types beforehand, something we tend to avoid when using complex types. Thus I decided to use the best alternative I could think.

Auxiliary method and function

Once set aside these differences, you may be curious why I created the method get<>(), which behaves just like std::get<>() in the iterator value. For the product iterator, I decided that the tuple would be a reasonable value_type, since it can return multiple values. To avoid copying the containers values, the iterator returns tuples of constant references to the values in the original containers. Therefore, the code builds a tuple each time the reference operator is called, but caches it in case it’s called multiple times before incrementing. Nonetheless, this tuple creation may be useless if the user just wants access to the elements. Because of this, the method get<>() was created, which allows direct access to the elements without requiring a tuple to be created.

Another point is that the use of containers instead of its iterators to create the product iterator causes a problem when using it with some iterators in boost, since they don’t have containers associated with them. To solve this issue, the library also provides a method to build a proxy for iterators, as shown in this example:

For the purpose of iterating, c2 behaves just like c1 whose value_type is a constant reference, but the iterators passed as arguments don’t require an associated container.

Comparison between C++ code with and without product iterator

Getting back to the original inspiration problem, I repeat the original C++ code to iterate over a grid of 2 values N times:

Now, compare this code with the version using the product iterator:

I think the second version is much cleaner and expresses better what is happening in the code, besides being able to be used as argument for some function to iterate over the combinations.

If you are interested in this library, I’d recommend checking it on GitHub and giving it a try. I’d love to hear some feedback and potential bugs!