13 Concurrency

Writing clean concurrent programs is hard—very hard. It is much easier to write code that executes in a single thread. It is also easy to write multithreaded code that looks fine on the surface but is broken at a deeper level. Such code works fine until the system is placed under stress.

Why Concurrency?

Concurrency is a decoupling strategy. It helps us decouple what gets done from when it gets done. Decoupling what from when can dramatically improve both the throughput and structures of an application. From a structural point of view the application looks like many little collaborating computers rather than one big main loop.
Consider, for example, the standard “Servlet” model of Web applications. These systems run under the umbrella of a Web or EJB container that partially manages concurrency for you.
But structure is not the only motive for adopting concurrency. Some systems have response time and throughput constraints that require hand-coded concurrent solutions.

Myths and Misconceptions:

  • Concurrency is hard. If you aren’t very careful, you can create some very nasty situations. Consider these common myths and misconceptions:
  • Concurrency always improves performance.
  • Design does not change when writing concurrent programs.
  • Understanding concurrency issues is not important when working with a container such as a Web or EJB container.

Here are a few more balanced sound bites regarding writing concurrent software:

  • Concurrency incurs some overhead, both in performance as well as writing
  • additional code.
  • Correct concurrency is complex, even for simple problems.
  • Concurrency bugs aren’t usually repeatable, so they are often ignored as one-offs
    instead of the true defects they are.
  • Concurrency often requires a fundamental change in design strategy.

Challenges

What makes concurrent programming so difficult? There are many possible paths that two threads can take through that one line of Java code, and some of those paths generate incorrect results.

Concurrency Defense Principles

What follows is a series of principles and techniques for defending your systems from the
problems of concurrent code.

  • Single Responsibility Principle
    The SRP states that a given method/class/component should have a single reason to change.
    Here are a few things to consider:
  • Concurrency-related code has its own life cycle of development, change, and tuning.
  • Concurrency-related code has its own challenges, which are different from and often more difficult than nonconcurrency-related code.
  • The number of ways in which miswritten concurrency-based code can fail makes it challenging enough without the added burden of surrounding application code. Recommendation: Keep your concurrency-related code separate from other code.
  • Corollary: Limit the Scope of Data
    Two threads modifying the same field of a shared object can interfere with each other, causing unexpected behavior. One solution is to use the synchronized keyword to protect a critical section in the code that uses the shared object. It is important to restrict the number of such critical sections. The more places shared data can get updated, the more likely:
  • You will forget to protect one or more of those places—effectively breaking all code that modifies that shared data.
  • There will be duplication of effort required to make sure everything is effectively guarded.
  • It will be difficult to determine the source of failures, which are already hard enough to find.
    Recommendation: Take data encapsulation to heart; severely limit the access of any
    data that may be shared.
  • Corollary: Use Copies of Data
    A good way to avoid shared data is to avoid sharing the data in the first place. In some situations it is possible to copy objects and treat them as read-only. In other cases it might be possible to copy objects, collect results from multiple threads in these copies and then merge the results in a single thread.
  • Corollary: Threads Should Be as Independent as Possible
    Consider writing your threaded code such that each thread exists in its own world, sharing no data with any other thread. Each thread processes one client request, with all of its required data coming from an unshared source and stored as local variables.
    Recommendation: Attempt to partition data into independent subsets than can be operated on by independent threads, possibly in different processors.

Know Your Library

Java 5 offers many improvements for concurrent development over previous versions. There are several things to consider when writing threaded code in Java 5:

  • Use the provided thread-safe collections.
  • Use the executor framework for executing unrelated tasks.
  • Use nonblocking solutions when possible.
  • Several library classes are not thread safe.
  • Thread-Safe Collections
    Recommendation: Review the classes available to you. In the case of Java, become familiar with java.util.concurrent, java.util.concurrent.atomic, java.util.concurrent.locks.

Know Your Execution Models

There are several different ways to partition behavior in a concurrent application. To discuss them we need to understand some basic definitions.
Bound Resources – Resources of a fixed size or number used in a concurrent environment.
Mutual Exclusion – Only one thread can access shared data or a shared resource at a time.
Starvation – One thread or a group of threads is prohibited from proceeding for an excessively long time or forever.
Deadlock – Two or more threads waiting for each other to finish.
Livelock – Threads in lockstep, each trying to do work but finding another “in the way.”
Given these definitions, we can now discuss the various execution models used in
concurrent programming.

  • Producer-Consumer
    One or more producer threads create some work and place it in a buffer or queue. One or more consumer threads acquire that work from the queue and complete it. The queue between the producers and consumers is a bound resource.
  • Readers-Writers
    When you have a shared resource that primarily serves as a source of information for readers, but which is occasionally updated by writers, throughput is an issue.
    The challenge is to balance the needs of both readers and writers to satisfy correct
    operation, provide reasonable throughput and avoiding starvation. A simple strategy makes writers wait until there are no readers before allowing the writer to perform an update. If there are continuous readers, however, the writers will be starved. On the other hand, if there are frequent writers and they are given priority, throughput will suffer.
  • Dining Philosophers
    Imagine a number of philosophers sitting around a circular table. A fork is placed to the left of each philosopher. A philosopher cannot eat unless he is holding two forks. If the philosopher to his right or left is already using one of the forks he needs, he must wait until that philosopher finishes eating and puts the forks back down. Replace philosophers with threads and forks with resources and this problem is similar to many enterprise applications in which processes compete for resources. Unless carefully designed, systems that compete in this way can experience deadlock, livelock, throughput, and efficiency degradation.

Most concurrent problems you will likely encounter will be some variation of these three problems.
Recommendation: Learn these basic algorithms and understand their solutions.

Beware Dependencies Between Synchronized Methods

Dependencies between synchronized methods cause subtle bugs in concurrent code. Recommendation: Avoid using more than one method on a shared object. There will be times when you must use more than one method on a shared object. When this is the case, there are three ways to make the code correct:

  • Client-Based Locking—Have the client lock the server before calling the first method
  • Server-Based Locking—Within the server create a method that locks the server, calls all the methods, and then unlocks.
  • Adapted Server—create an intermediary that performs the locking.

Keep Synchronized Sections Small

The synchronized keyword introduces a lock. All sections of code guarded by the same lock are guaranteed to have only one thread executing through them at any given time. Locks are expensive because they create delays and add overhead. Recommendation: Keep your synchronized sections as small as possible.

Writing Correct Shut-Down Code Is Hard

Writing a system that is meant to stay live and run forever is different from writing something that works for awhile and then shuts down gracefully. Graceful shutdown can be hard to get correct. Common problems involve deadlock, with threads waiting for a signal to continue that never comes. Recommendation: Think about shut-down early and get it working early. It’s going to take longer than

Testing Threaded Code

Proving that code is correct is impractical. Testing does not guarantee correctness. However, good testing can minimize risk.
Recommendation: Write tests that have the potential to expose problems and then run them frequently, with different programatic configurations and system configurations and load. If tests ever fail, track down the failure. Don’t ignore a failure just because the tests pass on a subsequent run.

  • Treat Spurious Failures as Candidate Threading Issues
    Threaded code causes things to fail that “simply cannot fail.”
    Recommendation: Do not ignore system failures as one-offs.
  • Get Your Nonthreaded Code Working First
    Recommendation: Do not try to chase down nonthreading bugs and threading bugs at the same time. Make sure your code works outside of threads.
  • Make Your Threaded Code Pluggable
    Recommendation: Make your thread-based code especially pluggable so that you can run it in various configurations.
  • Make Your Threaded Code Tunable
    Getting the right balance of threads typically requires trial an error. Early on, find ways to time the performance of your system under different configurations. Allow the number of threads to be easily tuned.
  • Run with More Threads Than Processors
    Things happen when the system switches between tasks. To encourage task swapping, run with more threads than processors or cores. The more frequently your tasks swap, the more likely you’ll encounter code that is missing a critical section or causes deadlock.
  • Run on Different Platforms
    Different operating systems have different threading policies, each of which impacts the code’s execution. Multithreaded code behaves differently in different environments.
    Recommendation: Run your threaded code on all target platforms early and often.
  • Instrument Your Code to Try and Force Failures
    It is normal for flaws in concurrent code to hide. How might you increase your chances of catching such rare occurrences? You can instrument your code and force it to run in different orderings
    There are two options for code instrumentation:
    • Hand-coded
    • Automated
  • Hand-Coded
    You can insert calls to wait(), sleep(), yield(), and priority() in your code by hand.
    There are many problems with this approach:
    • You have to manually find appropriate places to do this.
    • How do you know where to put the call and what kind of call to use?
    • Leaving such code in a production environment unnecessarily slows the code down.
    • It’s a shotgun approach. You may or may not find flaws. Indeed, the odds aren’t with you.
  • Automated
    You could use tools like an Aspect-Oriented Framework, CGLIB, or ASM to programmatically instrument your code.
    The point is to jiggle the code so that threads run in different orderings at different times. The combination of well-written tests and jiggling can dramatically increase the chance finding errors.
    Recommendation: Use jiggling strategies to ferret out errors.
Previous: 12 EmergenceUp: ContentsNext: 17 Smells and Heuristics