Here comes a little bit more information on OpenMP, the standard programming model for shared memory parallelization. Variables can be shared among all threads or duplicated for each thread. Shared variables are used by threads for communication purposes. This was already outlined in an earlier post. The OpenMP Data Scope Clauses private and shared declare this behavior in the following way: private (list) declares the variables in list as private in the scope of each thread; shared (list) declares the listed variables as shared among the threads in the team. An example could look like the following: #pragma omp parallel private (k)
If the scope is not specified, shared will be the default. But there are a couple of exceptions: loop control variables, automatic variables within a block and local variables in called sub-programs.
Let’s talk about another important directive, the critical directive. The code block enclosed by
#pragma omp critical [(name)] will be executed by all threads but just by one thread at a time. Threads must wait at the beginning of a critical region until no other thread in the team is working on the critical region with the same name. Unnamed critical directives are possible. This leads me to another important fact that should not be overlooked. OpenMP does not prevent a developer to run into problems with deadlocks (threads waiting on locked resources that will never become free) and race conditions where two or more threads access the same shared variable unsynchronized and at least one thread modifies the shared variable in a concurrent scenario. Results are non-deterministic and the programming flaws are hard to find. It is always a good coding style to develop the program in a way that it could be executed in a sequential form (strong equivalency). This is probably easier said than done. So testing with appropriate tools is a must. I had the chance to participate in a parallel programming training (btw, an excellent course!) a couple of weeks ago where the Thread Checker from Intel was presented and used in exercises. It is a debugging tool for threaded application and designed to identify dead-locks and race-conditions.