Use of inline keyword in OpenCL code
Starting moving on a more recent architecture and in particular on a AMD MI50 Instinct Vega20 GPU
, it turned out that the e.g. kernels in the file ocl_kernel/force_gauge_tlsym.cl were failing to compile, giving the bizarre error
lld: error: undefined hidden symbol: gauge_force_tlsym_per_link
>>> referenced by /tmp/comgr-58e10f/input/linked.bc.o:(gauge_force_tlsym)
>>> referenced by /tmp/comgr-58e10f/input/linked.bc.o:(gauge_force_tlsym)
Error: Creating the executable from LLVM IRs failed.
which was surprising, since the OpenCL compiler was meaning not to find the gauge_force_tlsym_per_link
definition in the gauge_force_tlsym
kernel, although the function definition was in the same file just above the kernel.
After lots of time experimenting and googling, I finally stumbled on this ROCm issue, which shed lots of light on the problem. The important points are
This sounds like misuse of the
inline
keyword. Clang conforms to C99 forinline
, which gives different semantics to either GNU C89 or C++. See here for more information.You should just be able to use
static inline
instead of inline
and
Remember OpenCL C language is based on C99, so it uses the C99 semantics (which are different than the c89 or c++ semantics) for the
inline
keyword. For C99static inline
is the proper declaration for functions that should be inlined in the current translation unit.
plus the important explanation from the Clang link in the first quote, which I copy here for future reference.
inline int add(int i, int j) { return i + j; } int main() { int i = add(4, 5); return i; }
In C99,
inline
means that a function's definition is provided only for inlining, and that there is another definition (withoutinline
) somewhere else in the program. That means that this program is incomplete, because ifadd
isn't inlined (for example, when compiling without optimization), thenmain
will have an unresolved reference to that other definition.
Conclusions and what to do
Our usage of inline
seems to be basically wrong. From the information above my explanation of the issue in the molecular dynamics code is that for some reason the gauge_force_tlsym_per_link
function is not inlined by the compiler and hence its not-inline definition is looked for by the linker, but no other not-inline definition was provided, hence resulting in an unresolved reference. To confirm this, I tried to compile the same kernel with the option -cl-opt-disable
, which switches off all optimizations hence avoiding inlining, and this resulted in many more undefined references, namely to all inline
functions used (get_even_st_idx_local
, get_odd_st_idx_local
, calc_rectangles_staple
, etc.).
The fact that OpenCL comply with C99 standard is also explicitly stated on Wikipedia
OpenCL specifies programming languages (based on C99 and C++11)
and, hence, the fact that inline
functions all around in .cl
files in the codebase never gave problems is due to either the fact that the compilers used in our projects always inlined them or the fact that they wrongly implemented (or simply ignored) the inline
keyword (this might have been the case for old cards used so far, since they were still quite at the beginning of the OpenCL era).
Now, what should we do? After having read here and there
I got convinced that in our framework, where every kernel is compiled basically creating a single source file with all needed functions above the kernel(s), we should really mark OpenCL functions to be used in kernels as static inline
. This should first be checked not to break code on older architectures, whose compilers hopefully do not do the wrong thing encountering the static inline
keyword.
@cuteri Can you give me your feedback here?
-
Experiment on various architectures -
Decide what to do -
If it is the way to go, fix keywords in kernel code
A quick grep
for inline
in ocl_kernel folder shows that there are functions where the inline
specifier is given after the return type and this should not be the case, AFAIK.
-
Fix position of inline
specifier