Frequently asked questions
==========================

Common problems and questions related to using and developing pocl
are listed here.

Using pocl
----------

.. _supported-compilers:

Supported compilers and compiler combinations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

PoCL usually uses two different compilers (though may be built
using only one). One is used to compile C and C++ files - this is usually
the "system compiler". It's specified by CMAKE_C{,XX}_COMPILER variables
to cmake, but usually just left to default. The second compiler is used
to build OpenCL files - this is always clang+llvm. It's specified by
-DWITH_LLVM_CONFIG=<path> to cmake.

You may use clang as both "system" and OpenCL compiler for pocl.
Note however that pocl uses the CXX_FLAGS *which the 2nd compiler (Clang/LLVM)
was built with*, to build parts of pocl that link with that compiler. This
may cause some issues, if you try to build pocl with a different compiler
as the one used to build the 2nd compiler - because gcc and clang are not
100% compatible with each other in flags. So far though we've only seen
warnings about unknown flags, not actual bugs.

Anyway, the most trouble-free solution is to use the same "system" compiler
to build pocl, as the one that was used to build the 2nd compiler. Note that
while most Linux distributions use GCC to build their Clang/LLVM,
the official downloads from llvm.org are built using Clang.

PoCL is not listed by clinfo / is not found
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Occasionally, proprietary implementations rewrite the ICD loader by their own
version. E.g. Intel SDK installer silently replaces
``/usr/lib/x86_64-linux-gnu/libOpenCL.so`` with a link to
``/etc/alternatives/opencl-libOpenCL.so`` which itself is a link to the intel's
libOpenCL implementation. The fix is to remove the symlinks manually
and reinstall the ICD loader after which both pocl and the Intel SDK
can be used through the ICD loader.

Deadlocks (freezes) on FreeBSD
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The issue here is that a library may not initialize the threading on BSD
independently. 
This will cause pocl to stall on some uninitialized internal mutex.
See: http://www.freebsd.org/cgi/query-pr.cgi?pr=163512

A simple work-around is to compile the OpenCL application with "-pthread", 
but this of course cannot be enforced from pocl, especially if an ICD loader 
is used. The internal testsuite works only if "-pthread" is passed 
to ./configure in CFLAGS and CXXFLAGS, even if an ICD loader is used.

clReleaseDevice or clCreateImage missing when linking against -lOpenCL (ICD)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

These functions were introduced in OpenCL 1.2. If you have built your ICD
loader against 1.1 headers, you cannot access the pocl implementations of
them because they are missing from the ICD dispatcher.

The solution is to rebuild the ICD loader against OpenCL 1.2 headers.

See: https://github.com/pocl/pocl/issues/27

"Two passes with the same argument (-barriers) attempted to be registered!"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you see this error::

  Two passes with the same argument (-barriers) attempted to be registered!
  UNREACHABLE executed at <path..>/include/llvm/Support/PassNameParser.h:73!

It's caused by initializers of static variables (like pocl's LLVM Pass names)
called more than once. This happens for example when you link libpocl twice
to your program, or you link libpocl and another library that dynamically links
to a different LLVM.

One way that could happen, is building pocl with ``-DENABLE_ICD=0`` while having
hwloc "plugins" package installed (with the opencl plugin). What happens is:

* libOpenCL.so gets built
* program gets linked to the built libOpenCL.so; that is linked to hwloc
* at runtime, hwloc will try to open the hwloc-opencl plugin; that links to
  system-installed libOpenCL.so (usually the ICD loader);
* the ICD loader will try to dlopen libpocl.so -> you get the error.

Although PoCL now has a workaround for the hwloc case, this will not work
in other cases; another solution is to uninstall the hwloc "plugins" package.

Why is pocl slow?
^^^^^^^^^^^^^^^^^

If pocl's kernel build seems really slow, it is very possible you have
built your LLVM with Debug+Asserts on (not configure --enable-optimized).
This should result in up to 10x kernel compiler slow downs. You can
really feel it when running 'make check', for example.

The kernel compiler cache often removes that overhead when you
run your OpenCL app the next time.

If pocl is otherwise slower than other OpenCL implementations, it's normal. 
pocl is known to run certain benchmarks faster, certain ones slower, 
when comparing against the Intel and AMD OpenCL SDKs. We hope to improve 
the performance in each release, so if you encounter performance 
regressions (an older pocl/LLVM version used to run an app faster), 
please report a bug.

pocl source code
----------------

Why C99 in host library?
^^^^^^^^^^^^^^^^^^^^^^^^

The kernel compiler passes and some of the driver implementations are in C++11
and it's much faster to implement things in C++11. Why require using C99 in
the host library?

pocl is meant to be very portable to various type of devices, also
to those with very little resources (no operating system at all and with pruned
runtime libraries). C has better portability to low end CPUs and VMs.

Thus, in order for a CPU to act as an OpenCL host without online kernel
compilation support, only C99 support is required from the target,
no C++ compiler, runtime or STL is needed. Also, C programs are said to
sometimes produce more "lightweight" binaries, but that is debatable.
Benchmarks
==============

CLPeak issues
----------------

Currently (Dec 2017) does not work. First, there's a global memory size
detection bug in CLPeak which makes it fail on all OpenCL calls (this
can be workarounded by using POCL_MEMORY_LIMIT=1). Second, compilation
takes forever - this can't be fixed in pocl and needs to be fixed in
either CLPeak or LLVM. CLPeak sources use recursive macros to create
a giant stream of instructions. Certain optimization passes
in LLVM seem to explode exponentially on this code. The second
consequence of giant instruction stream is, it easily overflows the
instruction caches of a CPU, therefore CLPeak results are highly
dependent on whether the compiler manages to fit the code into icache,
perhaps using loop re-rolling, and as such are not a reliable measure
of peak device FLOPS.

Luxmark issues
---------------

* Using the binary downloaded from www.luxmark.info might lead to pocl
  abort on creating cache directory. This is not a bug in Pocl, it's a
  consequence of the two programs (pocl & luxmark) having been compiled
  with different libstdc++. Using a distribution packaged Luxmark
  fixes this problem.

* It's recommended to remove luxmark cache (~/.config/luxrender.net)
  after updating pocl version.

* There's another bug (http://www.luxrender.net/mantis/view.php?id=1640)
  - it crashes after compiling kernels, because it doesn't recognize
  an OpenCL device. This requires editing scenes/<name>/render.cfg,
  you must add ``opencl.cpu.use = 0`` and ``film.opencl.device = 0``

* All scenes (Microphone, Luxball and Hotel) should compile & run
  with LLVM 6 and newer.