About Projects Blog Tags Email GitHub Twitter Press Feed

23 Jul 2013
Ruby: Case versus If (and a wee bit about Unless)

My awesome brother-in-law is learning coding generally and Ruby specifically lately, so we decided to check out ScreenHero and try some remote pairing.

One thing led to another, and next think you know he was asking me how you know when to use a case statement versus when to use if/elsif. We chatted about logic and clean design, and then started wondering if there really was a performance difference between case and if/elsif.

(Yes, yes, this sort of micro-optimization is usually way less important than readable and easy to maintain code. But it’s still fun to think about!)

“If you find you’re looking to optimize that intensely you probably don’t want an interpreted language.

I generally try to avoid case statements simply because it makes it easy for me or another programmer to come along in the future and add another case, increasing the branching in a bit of code that should probably have only done one thing in the first place.” - Jonan S.

Yep. Agreed on both points. And now that we’ve acknowledged that this exploration is just for funsies, let’s move on.

Midwire ran a few benchmarks and concluded that if/elsif is faster than case because case “implicitly compares using the more expensive === operator”.

It’s true - case totally uses threequal under the hood! (I mean, there’s a reason we say that threequal is for case equality.) But that doesn’t feel like the end of the story, so let’s see what else we can figure out here.

In some languages, case statements are implemented as hash tables to gain performance over if/elsif chains. Is that true in Ruby? If not, what is going on here? And where can we look for the answers?

THE RUBY DOCS AIN’T GONNA CUT IT THIS TIME

To be honest, I started this exploration by looking through the Ruby docs and clicking to toggle source on case, but there was nothing there. If and case are so fundamental that they can’t actually be defined as functions and thus can’t be explained by looking at the Ruby sourcecode. We have to look at the parser and compiler to figure out what’s going on with them.

Why? Well, scroll down to SICP Exercise 1.6.

Let us assume for the sake of argument that Ruby had if as a language keyword, but wanted to define case in terms of if/elsif/else.

Let’s pretend cond is just if, and rewrite this example…

(define (new-if predicate then-clause else-clause)
  (cond (predicate then-clause)
        (else else-clause)))

…more Ruby-ishly:

def new_if(predicate, then_clause, else_clause)
  if predicate
    then_clause
  else
    else_clause
  end
end

Well, okay. Now, what if we tried to use it like this?

irb(main):012:0> x = 0
=> 0
irb(main):013:0> new_if(true, x+=1, x+=2)
=> 1
irb(main):014:0> x
=> 3

Whoops! The then_clause and else_clause get evaluated before being passed in as arguments. That’s no good. Okay, fine, what if we set up new_if to take clauses as lambdas instead?

def new_if(predicate, then_clause, else_clause)
  if predicate
    then_clause.call
  else
    else_clause.call
  end
end

Cool, but what about scoping? We want the clause that ends up evaluated to be able to change variables from the scope outside the conditional. Good thing that’s not actually a problem in Ruby, where closures include bindings and don’t just close over values from the outer scope.

irb(main):015:0> x = 0
=> 0
irb(main):016:0> new_if(true, lambda { x+=1 }, lambda { x+=2 })
=> 1
irb(main):017:0> x
=> 1

If that seems deeply weird to you, you might want to check out this post on how bindings and closures work in Ruby.

But anyways, we know that case doesn’t have to take lambdas in Ruby, so we know that this can’t be how it works. And if it’s not a function defined in Ruby, but is a language keyword instead, we can’t really quite figure out what’s going on with it just by reading through the docs as usual. Bah, humbug. Where should we look next?

LUCKY FOR US, MRI IS OPEN SOURCE

When in doubt, it’s time to refer to primary source materials.

Okay, I see something about NODE_CASE in compile.c. Looks like we go through each NODE_WHEN, deal with array predicates and such, and ultimately add instructions for checkmatch and branchif, like so:

ADD_INSN1(cond_seq, nd_line(vals), checkmatch, INT2FIX(VM_CHECKMATCH_TYPE_CASE | VM_CHECKMATCH_ARRAY));
ADD_INSNL(cond_seq, nd_line(vals), branchif, l1);

So, what’s VM_CHECKMATCH_TYPE_CASE? Turns out it’s check ‘patten === target’.

And just look a bit further down in insns.def and we find DEFINE_INSN checkmatch. Of course, it turns out that checkmatch calls check_match, which checks case equality with idEqq, which (again) turns out to be :===.

That was pretty neat. Now, what about this branchif business? Well, it seems to be defined here. Seems pretty straightforward. It looks like both case and if/elsif are implemented as sequences of conditionals and gotos in Ruby, so we can’t expect to get the sort of performance boost with case in Ruby like we see in languages where case statements are implemented as hash tables instead.

That doesn’t really answer the initial question, though. Threequal, got it, sure. But what if we use === in our if/elsif statements? Is there any other performance difference between case and if/elsif, really?

OPENING UP THE HOOD WITH RubyVM::InstructionSequence

Oh, to hell with primary source materials. Let’s just open up the hood ourselves. Have you played with RubyVM::InstructionSequence yet? Seriously, it’s just about the niftiest thing around.

Want to see the YARV (“Yet Another Ruby Virtual machine”) bytecode your code really ends up translated into? Sure, no prob.

RubyVM::InstructionSequence::compile_file “[t]akes file, a String with the location of a Ruby source file, reads, parses and compiles the file, and returns iseq, the compiled InstructionSequence with source location metadata set.”

We can verify stuff pretty easily this way. Let’s start by testing out a case statement:

number = 15
case number
when 15
  'fifteen'
when 5
  'five'
when 3
  'three'
else
  number
end
== disasm: <RubyVM::InstructionSequence:<main>@./case.rb>===============
local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1] s1)
[ 2] number     
0000 trace            1                                               (   1)
0002 putobject        15
0004 setlocal_OP__WC__0 2
0006 trace            1                                               (   2)
0008 getlocal_OP__WC__0 2
0010 dup              
0011 opt_case_dispatch <cdhash>, 35
0014 dup                                                              (   3)
0015 putobject        15
0017 checkmatch       2
0019 branchif         42
0021 dup                                                              (   5)
0022 putobject        5
0024 checkmatch       2
0026 branchif         49
0028 dup                                                              (   7)
0029 putobject        3
0031 checkmatch       2
0033 branchif         56
0035 pop                                                              (  10)
0036 trace            1
0038 getlocal_OP__WC__0 2
0040 leave            
0041 pop              
0042 pop                                                              (  11)
0043 trace            1                                               (   4)
0045 putstring        "fifteen"
0047 leave                                                            (  11)
0048 pop              
0049 pop              
0050 trace            1                                               (   6)
0052 putstring        "five"
0054 leave                                                            (  11)
0055 pop              
0056 pop              
0057 trace            1                                               (   8)
0059 putstring        "three"
0061 leave

And an if/elsif:

number = 15
if number == 15
  'fifteen'
elsif number == 5
  'five'
elsif number == 3
  'three'
else
  number
end
== disasm: <RubyVM::InstructionSequence:<main>@./if.rb>=================
local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1] s1)
[ 2] number     
0000 trace            1                                               (   1)
0002 putobject        15
0004 setlocal_OP__WC__0 2
0006 trace            1                                               (   2)
0008 getlocal_OP__WC__0 2
0010 putobject        15
0012 opt_eq           <callinfo!mid:==, argc:1, ARGS_SKIP>
0014 branchunless     22
0016 trace            1                                               (   3)
0018 putstring        "fifteen"
0020 leave                                                            (   2)
0021 pop              
0022 getlocal_OP__WC__0 2                                             (   4)
0024 putobject        5
0026 opt_eq           <callinfo!mid:==, argc:1, ARGS_SKIP>
0028 branchunless     36
0030 trace            1                                               (   5)
0032 putstring        "five"
0034 leave                                                            (   4)
0035 pop              
0036 getlocal_OP__WC__0 2                                             (   6)
0038 putobject        3
0040 opt_eq           <callinfo!mid:==, argc:1, ARGS_SKIP>
0042 branchunless     50
0044 trace            1                                               (   7)
0046 putstring        "three"
0048 leave                                                            (   6)
0049 pop              
0050 trace            1                                               (   9)
0052 getlocal_OP__WC__0 2
0054 leave

Let’s pause for a moment and make some predictions. What do you think this all might mean?

(a/k/a Dear rabbit hole: I’m in you.)

Personally, I’m pretty intrigued by the fact that if/elsif uses branchunless, while case uses branchif. Based on that alone, I’d expect if/elsif to be faster in situations where one of the first few possibilities is a match, and for case to be faster in situations where a match is found only way further down the list (when if/elsif would have to make more jumps along the way on account of all those branchunlesses).

BENCHMARK ALL THE THINGS

To hell with reading the bytecode. Let’s just benchmark some shit.

Here’s my rough little benchmarking code. I actually tested if/elsif twice: once with ==, and once with ===.

I tested each option with a list of 15 predicates, ranging from 15 as the first down to 1 as the last. So according to my prediction, case should be fastest if I test with n == 1, and slowest when n == 15.

I ran the benchmark a bunch of times, and got results pretty consistently along these lines:

n = 1 (last clause matches)
if:           7.4821e-07
threequal_if: 1.6830500000000001e-06
case:         3.9176999999999997e-07
 
n = 15 (first clause matches)
if:           3.7357000000000003e-07
threequal_if: 5.0263e-07
case:         4.3348e-07

Benchmarking seems to mostly confirm our theory, except note that that threequal_if (a sequence of if/elsifs comparing with ===) was the slowest in both cases. It was slower by an order of magnitude when the last clause matched (n == 1), where both branchunless and the expensive === comparison were slowing it down, and even when the first clause matched (n == 15), when my guess is that the slowness of === outweighed the slowness of the single extra jump case had to make because of branchif.

(When I was initially writing this post, I had some messed up benchmarking and ended up way down the rabbit hole reading about branch prediction optimization, which your CPU deals with. This paper was interesting, too. But never mind that now.)

Anyways. None of this is dispositive, but we have some evidence and a better understanding of how things are implemented here, which is ultimately the real point.

And speaking of rabbit holes, I wonder why case and if/elsif have that different branching… something about different assumptions about how they’d be used, maybe? What about unless?

n = 1
unless n == 1
  puts "sadface"
end

As one might expect, unless uses branchunless (like case, and unlike if).

== disasm: <RubyVM::InstructionSequence:<main>@./unless.rb>=============
local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1] s1)
[ 2] n          
0000 trace            1                                               (   1)
0002 putobject_OP_INT2FIX_O_1_C_ 
0003 setlocal_OP__WC__0 2
0005 trace            1                                               (   2)
0007 getlocal_OP__WC__0 2
0009 putobject_OP_INT2FIX_O_1_C_ 
0010 opt_eq           <callinfo!mid:==, argc:1, ARGS_SKIP>
0012 branchunless     17
0014 putnil           
0015 leave            
0016 pop              
0017 trace            1                                               (   3)
0019 putself          
0020 putstring        "sadface"
0022 opt_send_simple  <callinfo!mid:puts, argc:1, FCALL|ARGS_SKIP>
0024 leave

And there you have it. Any conclusions I might draw from that would be pure speculation, so I’ll leave the facts to stand on their own merits. Knowing is half the battle &c.

Hope this answers your question, Dan! And gives you a few extra tools for looking into the next few as well. ^^ Hooray for rabbit holes!

About Projects Blog Tags Email GitHub Twitter Press Feed