Use of inline keyword in OpenCL code

Starting moving on a more recent architecture and in particular on a AMD MI50 Instinct Vega20 GPU, it turned out that the e.g. kernels in the file ocl_kernel/force_gauge_tlsym.cl were failing to compile, giving the bizarre error

lld: error: undefined hidden symbol: gauge_force_tlsym_per_link
>>> referenced by /tmp/comgr-58e10f/input/linked.bc.o:(gauge_force_tlsym)
>>> referenced by /tmp/comgr-58e10f/input/linked.bc.o:(gauge_force_tlsym)
Error: Creating the executable from LLVM IRs failed.

which was surprising, since the OpenCL compiler was meaning not to find the gauge_force_tlsym_per_link definition in the gauge_force_tlsym kernel, although the function definition was in the same file just above the kernel.

After lots of time experimenting and googling, I finally stumbled on this ROCm issue, which shed lots of light on the problem. The important points are

This sounds like misuse of the inline keyword. Clang conforms to C99 for inline, which gives different semantics to either GNU C89 or C++. See here for more information.

You should just be able to use static inline instead of inline

and

Remember OpenCL C language is based on C99, so it uses the C99 semantics (which are different than the c89 or c++ semantics) for the inline keyword. For C99 static inline is the proper declaration for functions that should be inlined in the current translation unit.

plus the important explanation from the Clang link in the first quote, which I copy here for future reference.

inline int add(int i, int j) { return i + j; }

int main() {
  int i = add(4, 5);
  return i;
}
In C99, inline means that a function's definition is provided only for inlining, and that there is another definition (without inline) somewhere else in the program. That means that this program is incomplete, because if add isn't inlined (for example, when compiling without optimization), then main will have an unresolved reference to that other definition.

Conclusions and what to do

Our usage of inline seems to be basically wrong. From the information above my explanation of the issue in the molecular dynamics code is that for some reason the gauge_force_tlsym_per_link function is not inlined by the compiler and hence its not-inline definition is looked for by the linker, but no other not-inline definition was provided, hence resulting in an unresolved reference. To confirm this, I tried to compile the same kernel with the option -cl-opt-disable, which switches off all optimizations hence avoiding inlining, and this resulted in many more undefined references, namely to all inline functions used (get_even_st_idx_local, get_odd_st_idx_local, calc_rectangles_staple, etc.).

The fact that OpenCL comply with C99 standard is also explicitly stated on Wikipedia

OpenCL specifies programming languages (based on C99 and C++11)

and, hence, the fact that inline functions all around in .cl files in the codebase never gave problems is due to either the fact that the compilers used in our projects always inlined them or the fact that they wrongly implemented (or simply ignored) the inline keyword (this might have been the case for old cards used so far, since they were still quite at the beginning of the OpenCL era).

Now, what should we do? After having read here and there

I got convinced that in our framework, where every kernel is compiled basically creating a single source file with all needed functions above the kernel(s), we should really mark OpenCL functions to be used in kernels as static inline. This should first be checked not to break code on older architectures, whose compilers hopefully do not do the wrong thing encountering the static inline keyword.

@cuteri Can you give me your feedback here?

Experiment on various architectures
Decide what to do
If it is the way to go, fix keywords in kernel code

A quick grep for inline in ocl_kernel folder shows that there are functions where the inline specifier is given after the return type and this should not be the case, AFAIK.

Fix position of inline specifier

Edited Jan 20, 2021 by Alessandro Sciarra

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information