Kokkos

#notes #Portability #Cmake

 

Building and Installing Kokkos with Cmake

    # on my office computer
    cmake .. -DCMAKE_CXX_COMPILER=/usr/bin/g++ -DCMAKE_INSTALL_PREFIX=/data/code/myLibs/kokkos -DKokkos_ENABLE_OPENMP=On -DKokkos_ARCH_HSW=On -DKokkos_ENABLE_HWLOC=On
    # on mesocenter
    # source ../scripts/loadModules.sh 
    module purge
    module load userspace/all
    module load cmake/3.20.2
    # use GNU compiler
    module load gcc/7.2.0
    module load openmpi/gcc72
    export CXX=`which g++`
    cmake .. -DCMAKE_CXX_COMPILER=$CXX -DCMAKE_INSTALL_PREFIX=/home/jlv/libs/kokkos/GNU -DKokkos_ENABLE_OPENMP=On -DKokkos_ARCH_HSW=On -DKokkos_ENABLE_HWLOC=On
    # use GNU compiler
    module load intel-compiler/64/2020.2.254
    export CXX=`which icpc`
    cmake .. -DCMAKE_CXX_COMPILER=$CXX -DCMAKE_INSTALL_PREFIX=/home/jlv/libs/kokkos/INTEL -DKokkos_ENABLE_OPENMP=On -DKokkos_ARCH_HSW=On -DKokkos_ENABLE_HWLOC=On

    make -j 4 & make install

 

Small project (single source file) using Kokkos

Suppose we have an application called myApp.cpp, which uses Kokkos. The CMakeLists.txt codes such as following

    cmake_minimum_required (VERSION 3.12)
    project (myApp CXX)
    find_package(Kokkos REQUIRED)
    
    add_executable(myApp myApp.cpp)
    target_link_libraries(myApp PRIVATE Kokkos::kokkos)

    add_library(target ${SRCS})
    # Kokkos is needed to build the lib AND any apps use this lib
    target_link_libraries(target PUBLIC Kokkos::kokkos)
    # Kokkos is needed to build the lib BUT apps use this lab doesn't need Kokkos
    target_link_libraries(target PRIVATE Kokkos::kokkos)

And the application is simply built as

    cmake . -DKokkos_DIR=/data/code/myLibs/kokkos/lib/cmake/Kokkos
    make 
    export OMP_NUM_THREADS=8
    export OMP_PROC_BIND=spread OMP_PLACES=threads
    ./myApp

 

Inline Builds

For individual projects, it may be preferable to build Kokkos inline rather than link to an installed package. The main reason is that you may otherwise need many different configurations of Kokkos installed depending on the required compile time features an application needs. For example there is only one default execution space, which means you need different installations to have OpenMP or Pthreads as the default space. Also for the CUDA backend there are certain choices, such as allowing relocatable device code, which must be made at installation time. Building Kokkos inline uses largely the same process as compiling an application against an installed Kokkos library.

For CMake, this means copying over the Kokkos source code into your project and adding add_subdirectory(kokkos) to your CMakeLists.txt.

 

functor: passing computational bodies to Kokkos

    // Defining the functor (operator+data)
    struct AtomForceFunctor {
      ForceType _atomForces ;
      AtomDataType _atomData ;
      AtomForceFunctor (ForceType atomForces , AtomDataType data ) :
        _atomForces ( atomForces ) , _atomData ( data ) {}

      void operator()( const int64_t atomIndex ) const {
        _atomForces [ atomIndex ] = calcul ateForce ( _atomData );
      }
    }

    // Executing in parallel with Kokkos pattern
    AtomForceFunctor functor ( atomForces , data );
    Kokkos::parallel_for ( numberOfAtoms , functor );

 

What are C++ functors and their uses?

A functor is pretty much just a class or struct which defines a public operator() instance method. That lets you create objects which “look like” a function:

    // this is a functor
    class add_x {
      add_x(int val) : x(val) {}  // Constructor
      int operator()(int y) const { return x + y; }

    private:
      int x;
    };

    // Now you can use it like this:
    add_x add42(42); // create an instance of the functor class
    int i = add42(8); // and "call" it
    assert(i == 50); // and it added 42 to its argument

    std::vector<int> in; // assume this contains a bunch of values)
    std::vector<int> out(in.size());
    // Pass a functor to std::transform, which calls the functor on every element 
    // in the input sequence, and stores the result to the output sequence
    std::transform(in.begin(), in.end(), out.begin(), add_x(1)); 
    assert(out[i] == in[i] + 1); // for all i

There are a couple of nice things about functors. One is that unlike regular functions, they can contain state. The above example creates a function which adds 42 to whatever you give it. But that value 42 is not hardcoded, it was specified as a constructor argument when we created our functor instance. I could create another adder, which added 27, just by calling the constructor with a different value. This makes them nicely customizable.

As the last lines show, you often pass functors as arguments to other functions such as std::transform or the other standard library algorithms. You could do the same with a regular function pointer except, as said above, functors can be “customized” because they contain state, making them more flexible (If one wanted to use a function pointer, he would have to write a function which added exactly 1 to its argument. The functor is general, and adds whatever you initialized it with), and they are also potentially more efficient. In the above example, the compiler knows exactly which function std::transform should call. It should call add_x::operator(). That means it can inline that function call. And that makes it just as efficient as if one had manually called the function on each value of the vector.

If one had passed a function pointer instead, the compiler couldn’t immediately see which function it points to, so unless it performs some fairly complex global optimizations, it’d have to dereference the pointer at runtime, and then make the call.