My second attempt at Coursera’s Scala course turned out to have a refreshingly twisty plot. The grand plan this time is to blaze through the course using both Scala and Clojure, and then top it off by dishing out code reviews of Storm (Clojure) and Kafka (Scala).
I didn’t expect the first week to hit me with a bunch of creepy differences in details that aren’t skin-deep. How hard could it be to write an infinite loop, right?
- The Background: Function Evaluation
- The Plan: Infinite Recursion
- The Twist I: TCO
- The Twist II: Lazy Evaluation
- Bonus Twist: Really Lazy Evaluation
The Background: Function Evaluation
The week started with some basic concepts on the substitution model of function execution. In a nutshell, there are two ways a called function can be reduced to a value in the substitution model:
- Call-by-value: The function’s arguments are fully reduced before the function is applied to them. This is the default method of substitution in both Clojure and Scala. An example series of reductions could be:
- Call-by-name: Applies the function to unreduced arguments. The arguments are evaluated in the body of the function if needed. This gives us a convenient way to short-circuit evaluation of a parameter if it is unused in the function body. An example series of reductions for this would be:
Note that I have omitted some of the reduction steps for brevity.
The Plan: Infinite Recursion
An in-class exercise was to write a function performing the boolean-and of two variables without using the built-in primitives. The crux was about remembering the short-circuits:
In Scala, this can be quickly accomplished by:
But the method above can be broken, and our weapon of choice is the infinite recursion loop:
This defines a function that calls itself. Note that because this is a def and not a val, the right-hand-side of the assignment is evaluated only when the function is called. val, in contrast, evaluates the right-hand-side at definition time.
Now if we provide this function as an argument to our boolean-and’er, we see the following reduction happening (remember that Scala reduces by-value):
What we’d like is to not evaluate the second parameter at all if the first is false. To achieve this, we can coax Scala into reducing a particular argument by-name instead:
The reduction for this would then be:
The infinite loop is never evaluated, since we have forced Scala to evaluate the second parameter by-name instead of by-value. Hence, we achieve our desired behaviour.
The Twist I: TCO
Translating this functionality to Clojure seemed trivial until I slammed into some odd behaviour. To start with, let’s write an innocent infinite loop function:
And let’s run this method and watch it eat up our CPU:
Unexpected, right? I was stumped! After some headless-chicken-scrambling into the details of function argument reduction in Clojure, I stumbled onto this illuminating blog post, comparing infinite loops in a number of languages. The sneaky culprit turned out to be TCO, or tail-call optimization.
Let’s look at what this means by a simple reduction example, applied to calculating the sum of ones upto N. Here is the non-tail-call-optimized reduction:
Observe how the chain of execution gets wider with increasing N. Each function call is appended to the call stack. Only when the last function is reduced does the stack get collapsed by subsequently reducing each function.
Our sumToN function can be tail-call-optimized by simply adding an additional laccumulator parameter. This parameter, along with the current N value, maintains the state of the function evaluation at that instant.
Notice how the evaluation of this function always has a constant tail length, and hence accumulates a constant amount of space on the call stack for any N value.
Scala Does Tail-Call Optimization By Default
And this happens really quietly. This is why the Scala infinite loop function we wrote earlier actually executed infinitely. In contrast, the Clojure one never runs, and the infinite function additions to the call stack eventually trigger a stack overflow.
The Twist II: Lazy Evaluation
To mimic Scala’s behaviour, let’s enforce tail-call optimization in Clojure:
Now let’s try the same boolean-and’ing exercise we did in Scala:
What did it just dump out on our feet? Where is our infinite loop?
The culprit this time turns out to be lazy evaluation; a function is not evaluated to its return value unless explicitly asked to. Clojure uses lazy evaluation by default, so instead of evaluating the function, it simply returns it to us, should we choose to explicitly evaluate it later. In contrast, Scala begins evaluating the loop function within the if-condition and hence runs into an infinite loop. That weird dump is a representation of our Clojure function.
Bonus Twist: Really Lazy Evaluation
Another interesting fact a comment in this blog post pointed out was an important difference between lazy evaluation in Haskell and Clojure. Consider this Haskell infinitely-recursive function:
When you call this function, it blocks infinitely, as expected. But in contrast to Clojure and Scala’s infinite loops, this one uses no CPU! Why?
It turns out Haskell is extremely lazy. Since our function doesn’t really do anything, it’s never really executed. You could simply print something to the screen within the function to set Haskell on fire, but it won’t budge until you do so.
The Computer Language Benchmark Game fell into Haskell’s lazy trap once, when the array sorting benchmark in Haskell outperformed hand-optimized C. It turned out that the sorted array was never printed out. So the Haskell function took the lazy way out and did absolutely nothing, while C worked hard to sort a bunch of numbers that would never see the light of day.