Posted by: Airtower | 2010-06-25

Events and threads: A mutex is not enough!

I wrote a little UDP server that would process messages like this:

  1. Incoming data on the UDP socket creates an event in the GLib main loop.
  2. The event callback function starts a handler thread from a GThreadPool.
  3. The thread locks a mutex for the socket, reads from the socket, maybe sends a reply (depends on the received message), unlocks the mutex and returns.

Looks harmless? It isn’t. The program used a signal callback to stop the main loop on SIGTERM. This would allow any remaining threads to finish their work before the program terminates. When I tested that, I was surprised that the server would not stop. I had to add a lot of debug output until I found the reason.

The problem was that the messages were read from the socket in the threads. The event callback returned after starting the thread, and the main loop continued to run. In the short time between thread start and socket read, the event would fire again and again, because the data was still available on the socket. This was enough to launch all handler threads in my 4 member thread pool.

The result: The first thread gets the mutex, reads the message, does its work, unlocks the mutex and returns. While the first thread reads, there are 3 more threads waiting for the mutex. When the first thread is done, one of the others grabs the mutex and tries to read. I used blocking functions, so it waits until a new message arrives. A new message fires an event, which starts another thread. I suppose you get the idea. All handler threads were trying to process incoming messages simultaneously! By the way, the SIGTERM callback actually worked. After SIGTERM, no new events and therefore no handler threads were created, and after receiving enough messages the server did indeed terminate, because each thread returned after processing its message.

This is why I wrote that a mutex is not enough. Yes, the mutex prevented the threads reading from the socket at the same time, but it did not stop the event. The solution was, of course, to read the message in the event callback and point the handler thread at the data. Lesson learned: Don’t just think about “Could these threads try to access something at the same time?” and throw in a mutex if the answer is yes. Threaded operation can create a lot of unintended connections, and it’s easy to overlook one of them. Take care, and you can avoid a long debugging session like the one I had today. 😉

Advertisements

Responses

  1. […] Concurrency problems that are not results of concurrent variable access and can not be solved by mutexes. […]


Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: