Redis: I like you, but you're crazy

Reading time ~2 minutes

Dear Redis,

I like you, but you're crazy

Don’t get me wrong, Redis is awesome. We absolutely love it at Stack Exchange, but it can have some unexpected behavior.

Okay, maybe not totally unexpected, but unexpected to me.

The other day I found myself writing a relatively simple testbed for processing a redis queue (via Booksleeve). Just a super simple while loop calling BLPOP and spitting out the results.

I ran into some… well, “interesting” behavior. And to be completely honest, it was mainly due to assumptions I had made about how Redis handles connections.

However, the edge case I ran into was most definitely not obvious, so I figured I’d document it here, partially in the hopes that I can help somebody out, but mainly just because I know I’m going to forget this and probably run into it again six months from now.

I wrote up what I was experiencing in a Stack Overflow question. Go ahead and read it. I’ll wait.

Done? Great! Anyway, I finally figured out what was going on. A few factors led to this behavior (as far as I understand it):

1 - The Redis server’s timeout setting was set to 0

This means that Redis won’t timeout idle connections. Huh? “Why would you want that?” you ask. Well, it turns out that this is a pretty good default for Redis (think: PUB/SUB). Sometimes there are clients connected to redis for days on end with no activity, just waiting for an event to be published.

2 - The tcp-keepalive setting was set to 0

This is closely tied to above. The way I understand it, the tcp-keepalive setting tells the OS to periodiacally send a TCP level “ping” of sorts to the client to detect if it’s still connected. Since this setting was set to zero, this feature was disabled.

And here’s the linchpin:

3 - Visual Studio’s “Stop Debugging” command (Shift+F5) is absolutely brutal.

The whole process is killed (no GC, no cleanup, most likely a TerminateProcess call). The process and anything that the process is doing is just nuked and the memory reclaimed (any die hard Win32 guys out there feel free to correct me if I’m wrong).

The combination of both timeout and tcp-keepalive means that:

  • Idle client connections are never killed.
  • The server can’t differentiate between an idle client and a dead one.

Which meant that whenever I’d use the “Stop Debugging” command, on the queue processor, the last BLPOP that was issued was still valid as far as Redis was concerned.

After hitting F5 and Shift+F5 a couple of times, I basically had quite a sum of queued up BLPOP commands from dead connections, which led to the odd results.

Moral of the story? Understand the configuration and behavior of the services you depend on, and don’t make assumptions.

ReactJS, or 'Why Client Side Is Fun Again'

A (hopefully) comprehensive walkthrough into ReactJS Continue reading

Programming the Infinity Keyboard

Published on February 25, 2015

Embedded Razor that doesn't suck

Published on November 25, 2014