On the problem of database functions implementation "fan out"​ - part III

Omer Pinto
February 11, 2021

On part 1 of this article, we introduced the problem of DB function fan-out, and we figured every such function is actually a multi-function with 2^n versions, given n inputs. On part 2 of this article we've seen a few solutions to the problem (including looking at two open-source databases MonetDB & ClickHouse). Before I'll introduce my solution on part IV, I've decided to add a preliminary article (this one), so I'll have the chance to discuss more deeply some of the TMP techniques and classes that will be used as building blocks on that solution.

Another advantage of this approach would be introducing a much more digestible code in the next article. If the reader internalizes the techniques in this part and the classes introduced, he/she will be much more prepared to digest the detailed solution at the next part. I admit, it is complex, but I think that with a structured, gradual approach, it is achievable and worth your while.

Surely, I couldn't go through all the necessary material in C++ templates & TMP from the very beginning. It's just too much content to go through and it isn't the purpose of this article. I'll try to explain every advanced technique used, but surely a knowledge of variadic templates is a perquisite. Before diving in, I'd like to recommend an excellent, short text, that introduces TMP in a unique way that might be beneficial for our purpose (it is 'to-the-point' and might be a great stepping-stone for start using TMP in your programs):

No alt text provided for this image

The authors Edouard Alligand & Joel Falcou refer to it as a report of C++ metaprogramming (written after C++ 14 was introduced). If you want a more thorough, comprehensive discussion on the topic, I would recommend C++ templates by David Vandevoorde, Nicolai Josuttis & Douglas Gregor (TMP bible on its 2nd version).

Introducing std::integer_sequence

In template metaprogramming there is no iterative construct. You can't iterate on a list of types (like the types inside std::tuple) with a for or while loop, as you would do in code that executes at runtime. You can however use recursion to get access to every type in the list and decide what you want to do with it. Let's see an example:

No alt text provided for this image

This is a simple example of a template class that can hold a std::tuple of any type and provides a way to print it - element by element. The class is templated on the types inside the tuple (Args) using a variadic template. Let us focus on the print_tuple (line 38) and print_tuple_impl (line 18) implementation first (ignore the efficient versions for now).

print_tuple_impl is a template function, that is templated on the index of the tuple: size_t k; For each such k, the function prints the kth element of the tuple and calls recursively for the next (k+1)th version of the function. The recursion stops when k equals the size of the tuple minus 1 (for a tuple tup of size n, we can call std::get<i>(tup) for every 0 <= i < n). The way the constant value is checked is by using the new C++ 17 feature if constexpr that lets us evaluate whether we got to the stop condition of the recursion. In this kind of condition we must only use values that are declared as constexpr and are known in compile-time (it is not the regular if construct from runtime). When we get to the last element, we print it and finish (avoiding a further recursive call). Finally, the public function "wrapper" print_tuple, calls print_tuple_impl<0> to start printing the tuple from the first index.

All right, you might think, what's the big deal? Why couldn't I just do it in a 'regular' for loop? You might try and do something like that:

No alt text provided for this image

When you actually try to call this function (with any tuple as argument), you'll get a horribly-looking output with lots of compile time errors. Looking closely you'll see that the relevant error states: "note:  template argument deduction/substitution failed: error: the value of ‘i’ is not usable in a constant expression". Simply put: you can't use running loop index as a constant is the expression std::get<i>(t). The reason for that is that std::get is templated on size_t template argument, and in order for the compiler to instantiate and call the right function, it needs to know at compile time what is the value of i. So, I hope I convinced you that we need TMP (as in the previous solution for that).

Now, the recursive approach shown above has been used since the 2000's in various cases and was quite powerful. However, it does have two disadvantages, the same ones that are held against TMP in every argument: it bloats our compiled binary and extends our compile time significantly. Indeed - if it's not enough that we actually create a class per tuple-type (as every different tuple type will instantiate the class using the Args template parameter differently), we generate a huge amount of "intermediate" functions of print_tuple_impl (the amount equals to the size of the tuple...).

This should be avoided at any cost! No one likes long compilation hours and large binaries.

Let's have a look at an alternative; For that, I'll use a C++ 14 class named std::integer_sequence. This class uses a variadic template of integers in order to provide us with a list of consecutive integers that could be used in compile time. For instance, if you call: std::make_index_sequence<10>{}, you'll create the sequence 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. Now, once that sequence is available as a compile-time constant, we can use it with variadic template expansion to print out our tuple without recursion:

  • Have a look at print_tuple_efficient_impl on line 30 in the figure above. We want this method to get a std::index_sequence with the length of the tuple. So we write it as a template function with variadic template parameter I of type size_t (template<size_t...I>), which takes as input a parameter of the type of the sequence itself: std::index_sequence<I...>.
  • Then, we use a new feature of C++ 17, called fold expression to print the values of the tuple with a variadic expansion on line 32. This is a bit tricky, but the basic idea is that the inner expression (std::cout << std::get<I>(_tuple) << ",") is executed for "every I" running from 0 to the tuple's length minus 1 (using the ... expansion operator). There are some associativity rules as you can read on the link above, but basically the fold-expression let's you reduce a parameter pack (I in that case) over unary / binary operations (in that case the entire std::cout operation placed inside the inner parenthesis). Thinking about it, it let's you execute the reduce function you might know from your favorite functional language, only it let's you do that using compile-time structures.
  • Finally, you can use the public function that wraps this meta-function, called print_tuple_efficient (line 44). This function calls std::make_index_sequence template function with the size of the tuple (a constexpr value!) in order to call the inner function print_tuple_efficient_impl with a parameter of type std::index_sequence<I...>, when I = tuple size.

This technique lets us implement the same functionality as before without generating any unnecessary intermediate types. Whenever you can, you should use the ... operator to apply a callable to every member of a variadic compile-time type, and not use recursive structures. This causes faster compilation time, it doesn’t generate all the unneeded intermediate types, the code is more concise and the binary size is smaller (and another argument against TMP goes down the trash :) ).

Back to our case

Now that we've understood the technique of using std::make_index_sequence, we are ready to have a look at our building blocks for the TMP solution to the database fan out problem. Recall from the 1st part, that while dealing with a multi-function with n inputs, we can use a truth table of n bits (whose size is 2^n) to help us identify every one of the 2^n different versions of the function. For instance, for the function subString with the signature subString(string, int, int), we'll have the following table:

truth table

Thus, we get that every parameter with 0 on its column is a constant (single-value) parameter input, and every parameter with 1 is a vector. Notice that the most-significant bit is the last parameter (third to the right in the function signature) and the least-significant bit is the first parameter. This representation of the different versions of the function subString might give us a hint about what we might need to supply to our meta-function in order to support different versions in a single place. Looking at the bits at the table, we get the binary numbers 000 - 111, which are the numbers 0 - 7. If we could supply our function with the version number (which is a compile-time constant), it might be able to use it in order to adapt the implementation to the subString's version written on the last column in the table above. This smart function will know how to "polymorphically" get each input parameter type according to its bit (either constant or vector) and implement subString's functionality only once. This is a compile-time "polymorphism".

But that's for another day: the full solution for that will be shown in part IV of this article. For now, let us ask the following question: how can we use the abstract idea of version number, presented in the table above, to define subString's polymorphic input (8 different versions)? I'd like to present here, in detail, two different, possible solutions. Although I'll eventually use only one of them in my own solution later, I think we can learn from examining both of them.

Tuple Splitter - concept introduction

Given a tuple containing subString's input types (in the same order): std::tuple<string, int, int> and a version number (which I prefer to call Mask as you'll soon understand why) between 0 and 7, I can use this Mask to get the tuple split to 2 separate tuples - one will represent a tuple of constants and the other will represent a tuple of vectors. Sounds complex? the idea is very simple. Let's look at the following example (read the '&' operation below, column-by-column):

input: std::tuple<std::string, int, int>, Mask = 0b101
1)vector_tuple (inclusive) = String, int, int
                              &      &    &
                              0      1    1
                          = std::tuple<int, int>
2) constant_tuple (exclusive) = String, int, int
                                 &      &    &
                                ~0     ~1   ~1
                             = std::tuple<string>

We actually run the Mask over the type list inside the tuple (in std::tuple<x,y,z>, x is obtained at index 0 and z at index 2. That's why we write the Mask backwards in the image above). The result of masking is the vector tuple result (011 indicates this vector will include 2 integers). Using the ~Mask (not operator) gets us the the scalar tuple (which will include only a string). Generally, for all possible Mask values, we'll have:

No alt text provided for this image

Please notice that every line is different. Even if lines 4 & 6 looks the same, there aren't: although both parameters 2 & 3 has the same type (int), they are still 2 different parameters in the subString function with a different meaning.

Now, assuming we could implement this splitter. How can we use that? Once we've split the input, we could decide that we implement a template method / class that will get the 2 input tuples separately from one another and then somehow get them intertwined to a single input (with the original order), as in the scalar version of subString. This could be useful in cases in which the subString version gets executed a lot of times with the same input constants but with different input vectors (think of an input vector of size 10^7 that we've decided to divide to 1000 bulks of size 10,000 running subString on each bulk separately). In that case, we could have utilized the tuple splitter so we can get the tuple of constants only once (e.g.: through a constructor of a wrapping class) and the different input vectors through function parameters (different at every invocation). We won't deal with the usage of that code right now, just with its implementation.

Tuple Splitter - C++ implementation

Note: The solution presented here and also the one in the next section could be found in the project TMP-Tutorial on my git right here. It includes the template metaprogramming classes and some tests to show how to use them.

Let's have a look at the tuple splitter solution:

No alt text provided for this image

As before, we have some kind of a wrapper class (like we had a wrapper function) and an inner class. We also use the std::index_sequence technique presented above to loop through the tuple's list of types. Take a deep breath, and let's dive into the solution:

  • First we declare a TupleSplitter structure on line 20 that is templated on a Type (will be used to hold the full input tuple), size_t Mask and another type I. We write this general struct with no implementation, so we could use template class specialization on lines 23-35 and get the types we need (a.k.a.: std::tuple<Types...> & std::index_sequence<I...>).
  • Next, on lines 23-35 we use a template specialization: we get a variadic type Types (held inside a tuple on the specialization), a Mask of type size_t for our mask, and variadic type of I (held inside an index_sequence on the specialization).
  • We use the notion of metafunction (From Boost.MPL). It is "a class or a class template invocable at compile-time". Such metafunctions have their inputs passed as template parameters, and their return type(s) as an inner type definition(s). Thus, inside the class we declare inner types, with the using clause. These are the results of the metaprogram and are equivalent to a return value of a function. So instead of a function with a return type, we have a metaprogram (TypeSplitter) with results types.
  • There are 3 defined types: an auxiliary type named tupleType on line 26 used to hold the full tuple type (std::tuple<Types...>) and another 2 types inclusive & exclusive, which defines the vector tuple & scalar tuple (accordingly) as presented above. These are the heart of this solution and their code will be analyzed next.
  • We'll focus on the inclusive definition on lines 27-30 (exclusive definition is very similar). On the heart of that implementation is the use of std::conditional, which is used to decide between 2 types. It's definition is: template< bool B, class T, class F > struct conditional, and in cppreference its documentation reads 'Provides member typedef type, which is defined as T if B is true at compile time, or as F if B is false'. So in our case, if the condition ((Mask >> I) & 0x1) is true, we get a tuple with a single type: the input tuple's type at current index, and if it's false, we get an empty tuple.
  • The condition actually uses the mask as suggested above: against any index 0 <= i < input_tuple_size, it checks whether the mask contains '1' at the relevant place (we shift right the mask i places and AND the last bit using another mask 0x01). If it does, the relevant type, given by the trait: typename std::tuple_element<I, tupleType>::type is inserted to single-type tuple (basically std::tuple_element gets a constant index I and returns the Ith type inside the tuple given by tupleType defined above). If it has '0', then the result of the compile-time condition will be tuple<>.
  • The result of std::conditional is collected on line 29, using the ::type after the angular template closing bracket of the condition (>). Then, it's initiating an object of that type by using {}.
  • Hence, every such std::conditional execution evaluates the necessary mask to determine whether the type on that index "should appear" on the results vector tuple. But wait - how do we "collect" all these separate results to one tuple? Like before - we use the variadic template expansion ..., this time with the help of the function std::tuple_cat. This function constructs a tuple that is the concatenation of all the tuples on its input! So we use the ... expansion to fill std::tuple_cat with the necessary inputs (this function, as you might expect, ignores empty tuples, just like we need!). Please note that this function actually gets tuples as input, and not tuple types, and this is why we instantiate these tuples by using {} as explained above.
  • Then, in order to define a type (which is eventually our purpose), we use decltype at line 27 to get the type of the result of the call to std::tuple_cat, and this is our result inclusive type.

Before moving to the outer wrapper type and conclude the implementation, let's have a look at what happens (at compile time!) in that complex code of type inclusive in an example; Let's assume we call TupleSplitter with std::tuple<int, string, double>, 0b110 & std::make_index_sequence<3>:

  1. For index 0 we'll have the condition results: (110 >> 0) & 0x1 = 0 and end with the empty tuple.
  2. For index 1 we'll have condition result (110 >> 1) & 0x1 = 11 & 01 = 1, and end with the tuple: tuple<string>.
  3. For index 2 we'll have condition result (110 >> 2) & 0x1 = 1 & 1 = 1, and end with the tuple: tuple<double>.
  4. Finally, we'll concatenate them all (using tuple concatenation) to the result of inclusive= std::tuple<string, double> which is the vector tuple we wanted to get by this mask. The exclusive checks for the opposite condition (looks when the mask equals 0), and will end up with: tuple<int>.

Now, let's conclude the discussion on TupleSplitter by examining the wrapping type TupleSplitterWrapper; Again, nothing new here: using the same technique of template specialization for the variadic type Types that will be included inside a tuple (designates the full tuple input). In addition, as in the 1st example above, we use std::make_index_sequence, this time with the tuple size (given by applying the operator sizeof... on the parameter pack: sizeof...(Types)). This wrapper calls the inner TupleSplitter class with both variadic templates required for it to do its job. Finally, the wrapper class defines it's own types to expose the inner class types: inclusiveType & exclusiveType (exposing the inner inclusive & exclusive accordingly).

Before moving on to the next use-case, let's see how can we check (at compile time!) that our class actually works. For that, have a look at the file TupleSplitterTest.cpp in my git. There are many examples / tests there and I'll refer only to one here:

No alt text provided for this image

We use a tuple of length 4 (std::tuple<int, float, double, bool>) as the full input tuple and call the Wrapper with mask 0b1010. We collect the inner types (inclusiveType & exclusiveType) and use static_assert (a compile time test that fails compilation in case the compile-time testable condition fails) with std::is_same trait, to test whether these types are equal to what we expect. The mask suggests that we'd like to split between the 1st & 3rd tuple types (marked in the mask by zeroes, so they'll be only on the exclusive type - the scalar tuple) and 2nd & 4th tuple types (marked in mask as ones, so they'll be only on the inclusive type - the vector tuple). You can look that this is indeed the case on lines 29-30. Additional tests on the file also show a "full" example of a tuple of size 3 (like in Substring) & examining all possible Masks applied on them using the TupleSplitterWrapper.

This concludes the example of TupleSplitter.

Functor Input Setter - concept introduction

This example is quite similar to the previous one, and therefore we'll introduce it much more briefly. You can find the full code in FunctorInputSetter.h on my git (including tests in FunctorInputSetterTester.cpp). Here, we'll implement a metaprogram class that will actually build the input type as described in the example at the first table image of subString above (using types and vector of types in the same tuple). More precisely: given a tuple of inputs and a version number (again using Mask) between 0 and tuple size - 1, I use this Mask to get a new, single tuple of the same length. This time, the mask decides for every type T at a time, whether to include T in the result tuple (i.e.: T represents a constant input element) or to include std::vector<T> in it (i.e.: T represents a vector input element).

This class is more appropriate to a use-case in which we would like to execute the function once (not executing it once per bulk, for multiple bulks, like suggested in the previous tuple splitter example). In that case, instead of splitting the input, we plolymorphically transforming it at compile time to the required input, given by the function version(mask). Moreover, this will be the version used as a building block on my solution that will be presented on part IV of this article.

Functor Input Setter - C++ implementation

Let's have a brief look at the code (it's very similar to TupleSplitter):

No alt text provided for this image

As you can observe, the structure is similar to TupleSplitter, with wrapper class and an impl one with class template specialization. There are a few changes though:

  • This time we only describe one output type (as we don't split the tuple but rather create a new one) - called type.
  • The condition inside std::conditional is the same as before, but its true and false compile-time result types are different. When the condition evaluates to true, for type T, we return the type const std::vector<T>&, and when it evaluates to false, we just return const T& .
  • We don't need std::tuple_cat anymore, because we always return something: const T& or const std::vector<T>& (on the previous example we had a case in which we needed 'no' type, so we used an empty tuple and had to concatenate it at the end. Here, instead, we use std::tuple directly with the variadic template expansion of ... (line 29).
  • Because, as mentioned earlier, this is the version we're going to use on the solution presented on part IV, I've added const reference attributes to the result type. This will be very important when we'll examine the performance of the solution later.

The use of the wrapper class is the same as before, so you can observe it on your own (lines 32-39). Finally, let's have a look at an example of using this class:

No alt text provided for this image

This is a 'full' example, as it checks all possible masks for an input tuple of length 3. Let's look into one mask value: 011. In that case the full input tuple is declared as baseTuple = std::tuple<string, float, double>, so using FunctorInputSetterWrapper with this input and mask 0b011 will cause the return type to transform only the 1st and 2nd tpyes of the tuple to vectors. This will result in the following tuple: tuple<const std::vector<std::string>&, const std::vector<float>&, const double&>, as shown in the static assert on lines 44 - 46.

Note: on the last example, using a reference to double seems unnecessary (no one uses a reference to simple numerical types in C++ as far as I know). However while this TMP class is used with other type, like a string or any other user-defined type, this reference avoids copying of data. In order to be uniform with all possible function inputs (simple types vs complex types) and all possible masks (leaving some types as-is, turning others to std:vector), we need to use this const reference "decoration" to all types. There might be a way to avoid this for a simple, non-vector, types, but that we'll surely complicate the solution even further, and it's not the purpose of this article.

That's it for this part. I believe there is much to digest here. I'm happy I've included this part as a standalone, so the readers can digest it appropriately before dealing with the code in the next part (I've done an aware effort to drill down to details as much as possible, but this isn't a book on templates or on TMP :)). The FunctorInputSetter will be used as a building block on the next part, and also are some of the techniques presented above.

Did you find this part interesting? innovative? boring? too-technical? or maybe still too hard to digest despite the explanations above? Hit the comments and let me know. I hope you enjoyed yourself and will join me on the next, final part of this article.

Omer Pinto

Experienced Senior Software Engineer with a demonstrated history of working in the computer software industry. Skilled in Parsing, Compiler Optimization, Databases, and Big Data Analytics. Strong engineering professional with a Bachelor of Science (BSc.) focused in Mathematics and Computer Science from The Open University of Israel.
Enthusiastic about algorithm & data structures design, multi-threading systems and challenges, and modern C++ advanced templates metaprogramming techniques.

Related Posts

Newsletter ItalyClouds.com

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form