Synchronization Primitives

Synchronization

Processes frequently need to synchronize and communicate with other processes. Situations where two or more processes are reading or writing some shared data and the final result depends on who runs precisely when, are called race conditions. The key to preventing trouble is mutual exclusion, that is, some way of making sure that if one process is using a shared variable or file, the other process will be excluded from doing the same thing [1].

On a single-processor system, the simplest solution is to have each process disable all interrupts just after entering its critical region and re-enable them just before leaving it. This approach is generally unattractive because it is unwise to give user processes the power to turn off interrupts. In addition, in a multicore (i.e. multiprocessor system), disabling the interrupts of one CPU does not prevent other CPUs from interfering with operations the first CPU is performing [1].

A Dutch mathematician, T. Dekker, was the first one to devise a software solution to the mutual exclusion problem. In 1981 G.L. Peterson discovered a much simpler way, thus rendering Dekker’s solution effectively obsolete.
Both solutions use busy waiting, continuously testing a variable until some value appears. This should usually be avoided, since it wastes CPU time. Only when there is a reasonable expectation that the wait will be short is busy waiting used. A lock that uses busy waiting is called a spin lock [1].

Modern computers, especially those designed with multiple processors in mind, have an Test and Set Lock like TSL RX,LOCK that works as follows. It reads the contents of the memory word lock into register RX and then stores a nonzero value at the memory address lock.
The operations of reading the word and storing into it are guaranteed to be indivisible. The CPU executing the TSL instruction locks the memory bus to prohibit other CPUs from accessing the memory until it done.
An alternative instruction to TSL is XCHG, which exchanges the content of two location automatically. All Intel x86 CPUs use XCHG instructions for low-level synchronization [1].

Both Peterson’s solutions and the solutions using TSL or XCHG have the defect of requiring busy waiting. Not only does this waste CPU time, but it can also cause a variant of the priority inversion problem.
Consider a process with low priority is in its critical region and a high priority process becomes ready to run. The scheduler will move the low priority process to the ready queue and run the high priority process. When the high priority process want to enter the critical region it begins busy waiting. Since the low priority process never gets a chance to leave its critical region, the high priority process loops forever [1].

There are some interprocess communication primitives that block instead of wasting CPU time when they are not allowed to enter their critical regions. One of the simplest is the pair sleep and wakeup. Sleep is s system call that causes the caller to block, that is, be suspended until another process wakes it up. The wakeup call has one parameter, the process to be awakened. It may however, still lead to race conditions. The essence of the problem is that a wakeup sent to a process that is not (yet) sleeping is lost.

Semaphores

In 1965 E.W. Dijkstra suggested using an integer variable to count the number of wakeups saved for future use. A semaphore could have the value 0, indicating that no wakeups were saved, or some positive value if one or more wakeups are pending. Dijkstra proposed two operations, now usually called down (originally P, Proberen, try) and up (V, Verhogen, make higher) [1].

The down operation checks to see if the value is greater than 0. If so, it decrements the value and just continues. If the value is 0, the process is put to sleep. Checking the value, changing it, and possibly going to sleep, are all done as a single, indivisible atomic action [1].

The up operations increments the value. If one (or more) processes were sleeping on the semaphore, one of them is chosen by the system (e.g. at random) and is allowed to complete its down operation. After an up with processes sleeping on it, the semaphore will still have the value of 0. However, there will be one fewer process sleeping on it. The operation of incrementing the semaphore and waking up one process is also indivisible [1].

The normal way is to implement up and down as system calls, with the operating system briefly disabling all interrupts. As all action take only a few instructions, no harm is done in disabling interrupts. If multiple CPUs are being used, each semaphore should be protected by a lock variable, with TSL or XCHG functions used to make sure that only one CPU at a time examines the semaphore [1].

Semaphores that are initialized to 1 and used by two or more processes to ensure that only one of them can enter its critical region at the same time are called binary semaphores. If each process does a down just before entering its critical region and an up just after leaving it, mutual exclusion is guaranteed [1].

Mutexes

When the semaphore’s ability to count is not needed, a simplified version of the semaphore, called a mutex, is sometimes used [1].

A mutex is a shared variable that can be in one of two states: unlocked or locked. When a thread needs access to a critical region, it calls mutex_lock. If the mutex is currently unlocked, the call succeeds and the calling thread is free to enter the critical region. If the mutex is already locked, the calling thread is locked until the thread in the critical region is finished and calls mutex_unlock [1].

While mutexes and semaphores have some similarities in their implementation, they should always be used differently. The correct use of a semaphore is for signaling from one task to another. Tasks that use semaphores either signal or wait — not both [2].
While a binary semaphore may be colloquially referred to as a mutex, a true mutex has a more specific use-case and definition, in that only the task that locked the mutex is supposed to unlock it [3]. A mutex is meant to be taken and released, always in that order, by each task that uses the shared resource it protects [2].

Because mutexes are so simple, they can easily be implemented in user space provided that a TSL or XCHG instruction is available [1]:

mutex_lock:
    TSL REGISTER,MUTEX        | copy mutex to register and set mutex to 1
	CMP REGISTER,#0       | was mutex zero?
	JZE ok                | it is was zero, mutex was unlocked, so return
	CALL thread_yield     | mutex is busy; schedule another thread
	JMP mutex_lock        | try again
ok: RET                       | return to caller; critical region entered

mutex_unlock:
    MOVE MUTEX,#0             | store a 0 in mutex
	RET                   | return to caller

The implementation of a mutex varies from operating system to operating system, but on Linux it is built on top of futexes. A futex (fast user space mutex) is a feature of Linux that implements basic locking (much like a mutex) but avoids dropping into the kernel unless it really has to. Since since switching to the kernel and back is quite expensive, doing so improves performance considerably [1].

The Standard C++ Library provides std::mutex. std::mutex is usually not accessed directly: std::unique_lock, std::lock_guard, or std::scoped_lock (since C++17) manage locking in a more exception-safe manner [5]. It is a mutex wrapper that provides a convenient RAII-style mechanism for owning a mutex for the duration of a scoped block [7].
Note: A common beginner error is to forget to give a lock_guard variable a name [7].
Note: std::lock is function that can lock two or more mutexes at once without the risk of deadlock [9].

Condition Variables

Mutexes are good for allowing or blocking access to critical sections. Condition variables allow threads to block due to some condition not being met. As an example, given a simple producer-consumer problem with a single buffer item. When the produces has filled the buffer, it must wait until the consumer empties it before producing the next item. Similarly, when the consumer has removed an item, it must wait until the producer has produced another one. [1].

The pattern is for one thread to lock a mutex, then wait on a condition variable when it cannot get what it needs. Eventually another thread will signal it and it can continue. On entry, the wait will atomically unlock the mutex it is holding. Upon successful return, after being signaled, the mutex shall have been locked again [1].

It is worth nothing that condition variables (unlike semaphores) have no memory. If a signal is sent to a condition variable on which no thread is waiting, the signal is lost [1].

The statement that puts a thread to sleep should always check the condition to make sure it is satisfied before continuing, as a thread might have been awakened due to a UNIX signal or some other reason [1] (spurious wakeup [2]).

Some RTOSs directly support condition variables. POSIX, for example, defines pthread calls relating to condition variables. If the RTOS does not provide condition variables, they can be constructed as instance of the Condition Variable Pattern [2].

The Standard C++ Library provides two implementations of a condition variable: std::condition_variable and std::condition_variable_any. In both cases, they need to work with a mutex in order to provide appropriate synchronization but the former is limited to working with std::mutex [4].

  • [1] Modern Operating Systems, Fifth Edition (p137..141) – Andrew S. Tanenbaum, Herbert Bos – 2024
  • [2] Doing Hard Time (p549…551) – Bruce Power Douglass – 1999
  • [3] opengroup: threads.html
  • [4] C++ Concurrency in Action (p69..71) – Anthony Williams – 2012
  • [5] cppreference: condition_variable.html

Monitors

In a large design, scattered mutual exclusion constructs can make life very difficult. What we want then is a replacement for the binary semaphore which, in program terms [1]:

  • Provides protection for critical regions and code
  • Encapsulates data together with operations applicable to this data
  • Is easy to use
  • Is difficult to misuse

To make it easier to write correct programs, Per Brinch Hansen (1973, [4]) and C. A. R. Hoare (1974, [5]) proposed a higher-level synchronization primitive called a monitor. A monitor is a collection of procedures, variables, and data structures that are all grouped together in a special kind of module or package [2]. Request to use the resource are made via monitor-provided access functions; no direct use can be made of the resource itself [1].

Monitors are a programming-language construct, so the compiler knows they are special and can handle calls to monitor procedures differently from other procedure calls. Typically, the first few instructions of a procedure will check to see if any other process is currently active within the monitor. Is so, the calling process will be suspended until the other process has left the monitor. It is up to the compiler to implement mutual exclusion on monitor entries, but a common way is to use a mutex of binary semaphore [2].

Monitors are a language construct which C does not have [2]. Programming languages that have supported monitors include [3]:

  • Ada since Ada 95 (as protected objects)
  • C# (and other languages that use the .NET Framework)
  • Java (via the wait and notify methods)
  • Go
  • Python (via threading.Condition object)
  • Ruby

By adding the keyword synchronized to a method declaration, Java guarantees that once a thread
has started executing a method, no other thread will be allowed to start executing any other
synchronized method of that object [2].
Synchronized methods differ from classical monitors in an essential way: Java does not have
condition variables built in. Instead, it offers two procedures, wait and notify, which are
equivalent of sleep and wakeup, except that when they are used inside synchronized methods,
they are not subject to race conditions [2].