with thanks to
Le Ton Beau de Marot
by Douglas Hofstadter
Hofstadter, Le Ton Beau de Marot
So, let's pick our poem - or rather, algorithm -
implement it in the conventional way, then go through weirder and
weirder problems and variations
Union Find
Union Find is a beautiful little algorithm for categorizing items
into disjoint sets. I know, what?
disjoint sets are sets that have no elements in common
so if we take a bunch of items and separate them out into disjoint
sets, each item will belong to one and only one set
Union Find makes it easy to set this up, and really fast to query
items for which set they belong to
let's go through a specific example!
How can we tell if two addresses are
on the same island?
let's say we have a database of all the addresses and streets (not
counting bridges and tunnels) in the world, and we want to be able
to really easily check if two addresses are on the same island.
This might be useful for me, because I'm from NYC, which is made up
of a bunch of islands! I live on one island - Long Island - and work
on another - Manhattan!
And most of my coworkers live on a different island...
...or in their case, continent!
So, how can I tell if my awesome coworker Jeff and I live on the
same island?
I want to be able to really easily put in my address and my
coworker's address and find out if we're on the same island.
Let's also say that I expect to do do a lot of querying, so I don't
want to have to traverse the graph every time if I can
speed up my read time by doing something clever in advance.
How might we be able to do this?
What if each island had a capital city! Then if we just asked each address what its capital city is, we could
really easily tell if they're on the same island.
if they have the same capital, they're on the same island, if not, not.
but that's not in the data we have. all we have is a bunch of nodes
(addresses) and edges connecting some of them (streets, not counting
bridges or tunnels)
so we want to somehow elect a capital city for each island - that is
to say, we want to have some identifier for each set that every
member of the set knows about - something that will be the same
for every member of the set, and different for members of different
sets
that's what Union Finds lets us do
if we take two nodes and we want to join them into a set, we can just
pick one of them at random and say it's now the representative of that set!
if a new node is added in, we know what the representative is for the
set already, so we can just make sure the new node points to that
same representative
Union Find lets us elect a
random but consistent
representative for each set
it doesn't matter that the representative was chosen randomly,
since it's still doing what we need it to do - it's the same for all
members of a given set, and members of different sets will have
different representatives
So, let's look at some code...
Union Find
class DisjointSetItem:
def __init__(self, payload):
self.payload = payload
self.parent = self
self.rank = 0
def find(self):
# ...
def union(self, other):
# ...
first, we go through all addresses & wrap each in a DSI
("parent", not "representative", bc we're going to implement each set
as a tree, where the rep - even though it's chosen
randomly! - is at the root of the tree.)
union joins two items into the same set; it makes sure they have the same parent
find returns the parent of the set an item is in - the root of its tree
our island-counting program
items = {}
for node in nodes:
items[node] = DisjointSetItem(node)
for node in nodes:
dsi = items[node]
for edge in node.outgoing_edges:
end_node_dsi = items[edge.end_node]
dsi.union(end_node_dsi)
first we go through and call union for on pair of connected nodes,
Once we've done that, every island is a disjoint set / a separate tree.
So then we can easily figure out if two items are in the same set by
calling find on each of them - if their roots are the same, they're
in the same tree (and thus in the same set); if not, they're in different
trees (so, different sets).
what does this mean for our island question, though?
here we have 3 houses - our nodes - and two roads connecting them -
our edges.
the idea is we call union on every pair of nodes connected by a street
(not counting bridges and tunnels), and in the end, any nodes that
share the same root are on the same island, and if we compare two
nodes and they don't have the same root, we'll know that they're on
different islands
(this is used inside Open Trip Planner, for instance!)
joining two nodes
def union(self, other):
my_root = self.find()
their_root = other.find()
if my_root == their_root:
return
elif my_root.rank < their_root.rank:
my_root.parent = their_root
elif my_root.rank > their_root.rank:
their_root.parent = my_root
else:
their_root.parent = my_root
my_root.rank += 1
we're going to keep track of our sets as
trees, so we can join two nodes into the same set by putting them
into the same tree structure
1) first, we find the root of each node's tree (we'll look at find in a moment)
2) if they're already joined, cool, we're done!
3) if not, we compare their ranks
a) if their ranks are the same, we pick one at random, increment
its rank, and set it to be the other node's parent
b) if they have different ranks, we say the higher rank wins and
its root becomes other node's parent
so, here are our 3 nodes again. We can see the edges, but they haven't
been joined into any sets yet
I'm going to use roof color to indicate
what the parent of a node has been set to. Here each node's parent
is still itself.
we pick one at random, and...
...now we have a tree! the one we picked has a higher rank now,
and its the parent of both! We can tell because the blue house has a
red roof now (okay, and also because of the arrow)
but what about that third green house?
we see it has a road connecting it to the blue house, but hasn't
been joined into the set yet - its roof is still green, its parent
is still itself.
Here we go. We compare the green house to the blue one because they're
connected by an edge - a road.
We look at the root of each - green's
root is itself, but blue's root is red!
And since red has a higher
rank (which is an approximation of the depth of its tree), red wins,
and green sets its parent to red as well.
finding a node's root
def find(self):
if self.parent != self:
self.parent = self.parent.find()
return self.parent
as I mentioned, we're going to keep track of our sets by holding
each set in a tree structure, so find can just tell us what the root is for the tree an item is in,
and will return the same root for every node in a given set
if a node's parent isn't itself, we know it's not the root of its tree,
and we should walk up its tree to find the root.
as a bit of an optimization, we save the node's root as its parent while searching. This keeps our trees shallow.
here, let's say we call find on the bottom node.
its parent is set to that middle node (as you can see by its blue roof)
we know it's already joined, because otherwise its parent would be itself,
so we have to look further
so we go to the blue house and find out what its parent is, then
recur and look at the red house
there, we finally find that red's parent is itself
we return red, which is the root of the tree and thus
the representative member of the set
and there's also this side effect - we actually rejigger our tree
and set our initial node's parent to be the root of the tree, which
is red.
that'll make querying faster next time, because we end up with
these broad, shallow trees.
so, that's it really. it's a pretty short, sweet algorithm, but
it uses a lot of interesting stuff that we take for granted all the time
Some of the fancy stuff this uses
classes
recursion
mutable variables
pointers
returning from functions
automatic memory management
if statements
What if we strip some of that stuff away?
What if we couldn't have classes?
We could say, we only have dumb records (or structs)
Structs will do just fine
struct DisjointSetItem {
struct DisjointSetItem *parent;
void *payload;
int rank;
};
Let's pretend we're writing in C...
and functions that operate on pointers to structs
struct DisjointSetItem *DSI_find(
struct DisjointSetItem *item) {
if (item->parent != item)
item->parent = DSI_find(item->parent);
return item->parent;
}
So now we have these families of functions, which operate on and
return pointers to structs, no big deal
We can even manage inheritance without classes
struct DSISubclass {
struct DisjointSetItem item;
int stuff;
};
this is not super important, but it's a kinda cool sidenote
If you have a DSISubclass instance, and you pass a pointer to it
into any of the functions that expect a DisjointSetItem,
everything's going to work out fine, because the beginning of
every DSIWithStuff is just a DSI (not a pointer, the struct itself).
Losing classes is no great loss.
Let's talk about the stack
We need this for recursion to work!
the Amsterdam plot
Someone had to invent the idea of the stack -
variously attributed to Turing, Bauer, Dijkstra, and Grace Hopper - the usual superheroes
Recursion had to be snuck into ALGOL-60, in a move that one of the
committee members (Bauer) actually referred to as the
"Amsterdam plot"! (Perpetrated in part by Dijkstra from Amsterdam)
our original find implementation
def find(self):
if self.parent != self:
self.parent = self.parent.find()
return self.parent
"But this isn't recursive! You're calling a method by the same name,
but it's a different method defined on a different object!"
Each instance just has a pointer to the same function in memory -
it doesn't change, so it'd be really wasteful to keep copying it around!
in fact, to make this easier to read, let's rewrite it as a function
find, re-written as a function
def find(item):
if item.parent != item:
new_parent = find(item.parent)
item.parent = new_parent
return item.parent
this is the same code as our original method, but it should be easier
to read as a function, so we can see really easily when we're passing
the item's parent into the same find() function.
So now we can see the problem - if we don't have a stack, then find
is fucked - it'll clobber over itself in memory.
memory clobbered, without the stack
def find(item):
if item.parent != item:
new_parent = find(item.parent)
# aw hell, item now points to
# the original item.parent!
(item.parent).parent = new_parent
return (item.parent).parent
Once upon a time, the idea was that each function would have some fixed
area of memory that was that function's own area. And so you couldn't
do recursion, because when you called the function from within itself,
the second call's variables would clobber over the first call's.
So, how could we rewrite find if we didn't have the stack to
save us from this problem?
find, as a function, de-optimized
def find(item):
if item.parent == item:
return item
else:
return find(item.parent)
earlier, we looked at the fancy version of find,
which does what's called "path compression". It had this side effect
- updated the item's parent whenever we called find and discovered
that the item's root was different than it's current parent.
We can do w/o that - it'll hurt our performance, but it'll work.
no data that needs to be preserved after the
recursive call - "tail recursive".
tail recursion can be transformed into a loop
def find(item):
current = item
while current.parent != current:
current = current.parent
return current.parent
Here's the new version!
We've just replaced tail recursion with a loop. No stack needed!
Tail Call Optimization
(some compilers will transform recursive functions into loops for you!)
Some smart compilers will do that for you, which is called "tail
call optimization".
gcc will supposedly even do this for some non-tail recursive
functions! How weird and cool is that?
via http://ridiculousfish.com/blog/posts/will-it-optimize.html
python won't
“
in many cases... the elimination of stack frames doesn't do anything for the algorithmic complexity of the code, but it does make debugging harder
”
- Guido van Rossum,
http://neopythonic.blogspot.com/2009/04/ final-words-on-tail-calls.html
Python won't - Guido is very explicitly opposed to tail call optimization,
and you can read the blog post he wrote explaining why he won't include
it in Python
ruby will, if you configure it to do so
RubyVM::InstructionSequence.compile_option = {
:tailcall_optimization => true,
:trace_instruction => false
}
MRI (standard) Ruby will do tail call optimization, but only if you
explicitly configure it to do so.
scala can warn you if you get in its way
import scala.annotation.tailrec
@tailrec def find = {
if (parent == this) {
this
} else {
parent.find
}
}
Scala will not only do tail call optimization for you, it has an
annotation you can use to have the compiler throw an error if the annotated
method can't be transformed into a loop!
back to our original find, as a function
def find(item):
if item.parent != item:
item.parent = find(item.parent)
return item.parent
Let's look at original, nicely optimized find implementation.
It has to do something *after* the inner call to find returns, so
it's not tail recursive
with path compression, transformed into a loop
def find(item):
items = [item]
# find the root
while items[-1].parent != items[-1]:
items.append(items[-1].parent)
root = items[-1]
# set each node's parent as the root
for i in items:
i.parent = root
return root
The basic idea of this code is that we just keep a list of all of the
items that need updating, and then once we know what the final result
is (after going through our loop), we go back and update them.
Now, we've done some "extra" memory allocation here, to hold the list of values.
But in the old version, we were doing that extra allocation to hold the stack.
It was just hidden from us.
But could we do actual recursion without having a stack?
(instead of translating it into a loop)
we could just build our own stack
def find(item):
return find_recursive(item, [])
def find_recursive(item, children):
if item.parent != item:
children.append(item)
root = find_recursive(item, children)
item = children.pop()
item.parent = root
return item.parent
So what if we create an explicit inverse for our find function?
append/pop should do the trick.
this is a very erlangy sort of a pattern
before each recursive call, we append the current item to the list
of children, which is now our stack. as we return from
each recursive call, we take that most recent item off the end of
children and set its parent before returning.
What if we had no variables?
Variable can just mean: a name for a value. So, "no variables" would
mean, "assembly language", which just has registers and memory
locations.
But a more interesting definiton of "variable" is "mutable memory
location". Languages like Erlang and Haskell, more-or-less, doesn't have variables
of this sort. You can create new values, and they can point to old values.
But once you bind a "variable" to some value, you can't ever change it or bind
it to a different value.
Since I don't want to get caught up in Haskell's other funny stuff or
distract you with weird syntax, I'm going to keep this in Python.
our original union implementation
(this is all about the side effects!)
def union(self, other):
my_root = self.find()
their_root = other.find()
if my_root == their_root:
return
elif my_root.rank < their_root.rank:
my_root.parent = their_root
elif my_root.rank > their_root.rank:
their_root.parent = my_root
else:
their_root.parent = my_root
my_root.rank += 1
The first question is: if our union function can't change anything,
then how do we later notice that it's happened?
I mean, this is all about the side effects!
Instead, we're going to change our API around a bit. Instead of
talking about individual items, we're going to talk about the disjoint
set forest as whole.
an api that accomodates immutable data
class DisjointSetForest:
def make_set(self, new_item_payload):
# ...
return DisjointSetForest(*args)
def union(self, item_1_id, item_2_id):
# ...
return DisjointSetForest(*args)
def find(self, item_id):
# ...
return item_root, DisjointSetForest(*args)
We need a new method make_set, which takes the place of our old
constructor.
This time, instead of just creating a new node, it creates the node and then
returns a new version of the forest, with the new node as part of the new forest.
Union also has to return a new forest, one where these two
nodes have already been joined.
In fact, as you can see, *every* method has to return a new forest.
And finally - and this is weird! - find has to return both what was
found, and the updated forest. This is a place where our API breaks
down a little bit, because find is, from the interface perspective,
not a place where we're modifying anything. It's just a query.
every change returns
an entirely new forest
If we can't mutate the old forest, with each change, we have to return a new
forest incorporating that change!
make_set takes the forest and a new node, wraps the new node in a
new DisjointSetItem, and returns a new forest that is the same as
the original forest except it also includes the new node
union takes the forest and the IDs of two items in that forest,
and retuns a new forest that is the same as the original forest
except that the two nodes identified by those IDs have been joined
into the same set
performance in an immutable world
whoa, that's a lot of copying stuff around! how can we live this this way?
the clojure devs out there are smiling, because this is just how it is -
there's probably a Datomic talk this week, right?
we're going to need a way to store the nodes that's going to be easy to copy. We're
going to be doing a lot of copying, because where we previously
changed things in place, we now have to return a new copy.
Shallow Trees
can approximate
Arrays
(like Clojure's persistent vectors!)
We're going to go with arrays.
the abstract data type, not "a consecutive sequence of mutable memory
cells". So, immutable lists indexed by number that can be accessed by arbitrary
index in approximately O(1) time.
Clojure implements something like this with its persistent vectors!
Bit-partitioned Tries
Clojure implements these with 32-ary trees. Arrays, of
course, are O(1) to read from an arbitrary index or write to an
arbitrary index. Trees aren't O(1) for anything.
But 32-ary trees aren't terrible. If they're reasonably well-balanced, they're nice &
shallow, which gets us close enough to constant time for Rich Hickey's purposes
tries / prefix arrays. usually letters, here digits, and bits give us faster operations
copying persistent vectors
(lots of shared memory!)
Cool, what does that get us?
Well, we can copy our immutable "arrays" really quickly, by which I mean create a new
tree but with one element different, which is basically what we'll want to do all the
time now. It's fast, and it shares most of the memory with the previous version
of the tree.
Clojure has a bunch of nice optimizations, but this is the basic idea.
You can read more about
Clojure's persistent vectors here
http://hypirion.com/musings/ understanding-persistent-vector-pt-1
For now, rather than switching to a distractingly different syntax,
I'll just keep using Python. We can pretend that Python's tuples are Clojure's
persistent vectors, immutable and rather beautifully implemented.
our immutable items
class DisjointSetItem:
def __init__(self, payload, parent, rank):
self.payload = payload
self.parent = parent or self
self.rank = rank
we're going to pretend that once we assign these fields, they can never be modified. When we need to make changes,
we just make a new DSI
no more instance methods - find and union are now
implemented on the forest, not on any items in the forest.
so, we keep replacing items with new ones instead of changing them, great, but
how do the other items know about this? we're replacing them, but... where?
losing track of things...
since we're no longer passing
around DisjointSetItems, how do we keep track of changes in an item's parent?
whoops!
Before it was easy, each node held a pointer to its parent, so if the parent
changed, its children would still be referencing the changed parent and it'd be
no big deal.
Now, if a node's parent changes, we just have a totally new object, and the old
pointer isn't helpful anymore! We're facing approximately the same problem as if
we didn't have pointers at all!
there's more to it than that, it's conceptually pretty similar!
the forest can keep an "array" of items
(with indices as their IDs!)
We could do something fancy with a dictionary (which we might implement as a
search tree), but to keep the slides short, let's stick with this "array" idea
(which is really a 32-ary tree anyways).
We'll use integer IDs for our items once we store them in the forest, and those
IDs can be array indexes.
When we add a new item, its ID will be the max_id of the new forest we return.
make_set, without variables
def max_id(self):
len(self.items) - 1
def make_set(self, payload):
new_item = DisjointSetItem(payload,
None), # parent
0) # rank
new_items = self.items + (new_item,)
return DisjointSetForest(new_items)
So we can see now how make_set has to work!
We create a new DisjointSetItem, which has that item passed in as its
payload, no parent, and zero rank.
We create a new array, which has all of the previous items plus this new item.
And we create a new DisjointSetForest, which has the new array of items.
If we want to get our new item's id, which is its index into the array of items
in the forest, we can just call .max_id on the new forest we've returned.
Again, with Python tuples this is going to be inefficient. But if we had some
tree-based immutable list structure, it would be, well, still somewhat inefficient,
but not terrible.
union, without variables
def union(self, item_1_id, item_2_id):
# ...
if root_1.rank < root_2.rank:
return replace(item_1_id, root_2)
elif root_1.rank > root_2.rank:
return replace(item_2_id, root_1)
else:
return replace(item_2_id, root_1, 1)
Our new union functions starts about the same as the old one,
but instead of changing any existing data structure, it returns
a new updated DisjointSetForest
def replace(self, index, root_index, incr=0):
existing = self.items[index]
new_root = DisjointSetItem(
existing.payload,
root_index,
existing.rank + incr)
new_items = (self.items[:index] +
(new_root,) + self.items[index + 1:])
return DisjointSetForest(self.max_id,
new_items)
And here's the actual replacement function.
We grab the item that needs to point at a new parent, and update its parent ID
to the new root index we're passing in.
Then we create a new array of items that has our updated version in place of the
old one, and return a new forest wrapping it
In the roof of each item, the number you see is the index of its parent -
since we can't hold onto a reference to a mutable object, we're holding onto
the index of the parent in that array of items in the forest.
if the item at that parent index is replaced, fine, the replacement is now our item's parent,
since an item only holds onto its parent by index - aka ID, aka a pointer to it.
I'm not going to implement find, because you get the idea
...oh hey, we've invented pointers!
If we didn't have pointers, we'd have to do something similar - create
and hold onto a data structure that we could index into, and use
those indices to refer to object's locations in out data structure.
Except there's a bunch of more complicated memory management stuff
we'd also have to do there that I don't really have time to get into
here, unfortunately!
What if we couldn't return from functions?
This whole time, we've been assuming that you call a function, and
when you're done, you return from that function, clutching a bloody
return value in your clenched fist. But what if you could never go home again?
If it was just about not returning values, we could do all side-effects, all the time.
But if you couldn't return at all..?
Then you would have no choice but to journey onward, ever deeper into
the call stack. How could that work?
We'd just go on and on, my friend...
def Constructor(self, some_args, continuation):
self.var1 = var1
self.var2 = var2
continuation(self)
Let's take a simple constructor...
now you pass a "continuation" into your constructor, along with
whatever else you would want to pass. And then instead of returning,
you just call continuation with whatever you would ordinarily return.
A continuation is just a function that represents the rest of your
program. I'm going to pause here and let that sink in, because it's
weird.
Continuation Passing Style
Actually, in some sense, it's simpler than ordinary function
calling.
Ordinarily, you have to call, and then return, and keep
track of a stack. With continuation passing style, there's no return,
so there's no stack.
You could compile this down to a bunch of goto statements!
“
we should... make the correspondence between the program (spread out in text space) and the process (spread out in time) as trivial as possible.
”
- Dijkstra, Goto Considered Harmful
our original island-counting program
items = {}
or node in nodes:
items[node] = DisjointSetItem(node)
for node in nodes:
dsi = items[node]
for edge in node.outgoing_edges:
end_node_dsi = items[edge.end_node]
dsi.union(end_node_dsi)
let's look at our island-counting program again
first we go through and call union with each pair of connected nodes,
Once we've done that, every island is a disjoint set / a separate tree.
So then we can easily figure out if two items are in the same set by
calling find on each of them - if their roots are the same, they're
in the same tree (and thus in the same set); if not, they're in different
trees (so, different sets).
let's just look at the first part...
items = {}
for node in nodes:
items[node] = DisjointSetItem(node)
Now let's translate this to CPS.
without iteration or mutable variables
def process_node(items, nodes, index):
if index == len(nodes):
return items
node = nodes[index]
items = items.add(node,
DisjointSetItem(node))
return process_node(items,
nodes,
index + 1)
# process all the nodes:
items = process_nodes({}, nodes, 0)
straightforward if long-winded:
1) if we've already processed all the nodes, we're done - return our finished
map of items
2) otherwise, we wrap those node in a DSI and add it to the items map
with the node itself as the key (so we can find it later on), and recurse on to
the next node in the list
no big deal, we've done this sort of stuff before
in continuation passing style
def process_node(items, nodes, index, k):
if index == len(nodes):
k(items)
node = nodes[index]
def add_and_continue(item):
add_item(items, node, item, rest)
def rest(items):
process_node(items, nodes, index+1, k)
DisjointSetItem(node, add_and_continue)
And now we get to the batshit insane continuation-passing style version
Holy shit, what happened to our code?! Let's go through this in chunks...
when we're done, we just go deeper
def process_node(items, nodes, index, k):
if index == len(nodes):
k(items)
# ...
So, here, where we would have returned, we just pass our map onto the
next bit bit of code, this continuation (which, by convention, is
called k).
Moving on, we grab the current node, and define a few helper functions:
our first helper function
node = nodes[i]
def add_and_continue(item):
add_item(items, node, item, rest)
The first helper function can't call add on the map, because this time it has to
pass in a continuation as the last argument.
So instead we'll have to define and use function that takes a map, a key, a
value, and a continuation function
The continuation ('rest') will get called inside of add_item with the newly created, larger map.
our second helper function, which contains the recursion
def rest(items):
process_node(items, nodes, index + 1, k)
The second continuation does the recursion, which was at the end of
our original function.
Since it's at the end this process, we have to pass in
the original continuation, which takes the place of the return statement and
holds the rest of our program.
k will get passed through each recursive step until it's called at the end
we pass a node & a continuation to our constructor
DisjointSetItem(node, add_and_continue)
Finally, we call our constructor with the first helper function, to get this
show on the road
I'm not going to write out what the constructor now looks like, but it takes a
continuation - add_and_continue - which also acts as a closure that holds
references to the rest of the nodes, the rest of the items, and the continuation
that holds the rest of our program
Feel free to grab me later is this is all gibberish and you want to chat about
closures in the hallway, by the way!
def process_node(items, nodes, index, k):
if index == len(nodes):
k(items)
# node = nodes[index]
# def add_and_continue(item):
# add_item(items, node, item, rest)
# def rest(items):
# process_node(items, nodes, index+1, k)
# DisjointSetItem(node, add_and_continue)
So let's walk through how this would get execute in the case of
one-element list.
First, we check if we're at the end.
def process_node(items, nodes, index, k):
# if index == len(nodes):
# k(items)
node = nodes[index]
# def add_and_continue(item):
# add_item(items, node, item, rest)
# def rest(items):
# process_node(items, nodes, index+1, k)
DisjointSetItem(node, add_and_continue)
We're not, so we look at the first node.
By the way, we're not going to worry about passing a continuation when indexing
into the nodes array, because ultimately indexing into arrays is just pointer arithmetic
We create a DisjointSetItem from this node, passing in our helper function
add_and_continue, which will be called with the new DSI created by the constructor.
def process_node(items, nodes, index, k):
# if index == len(nodes):
# k(items)
# node = nodes[index]
def add_and_continue(item):
add_item(items, node, item, rest)
# def rest(items):
# process_node(items, nodes, index+1, k)
# DisjointSetItem(node, add_and_continue)
add_and_continue creates a new map containing only a mapping
from this node to the item, and passes it to rest.
def process_node(items, nodes, index, k):
# if index == len(nodes):
# k(items)
# node = nodes[index]
# def add_and_continue(item):
# add_item(items, node, item, rest)
def rest(items):
process_node(items, nodes, index+1, k)
# DisjointSetItem(node, add_and_continue)
rest passes the map into a recursive call to process_node, with index = 1.
def process_node(items, nodes, index, k):
if index == len(nodes):
k(items)
# node = nodes[index]
# def add_and_continue(item):
# add_item(items, node, item, rest)
# def rest(items):
# process_node(items, nodes, index+1, k)
# DisjointSetItem(node, add_and_continue)
And since we're at the end this time, since index = 1 and we only had 1 node to
process, we finally call k, which is the rest of our program, passing in the
new items map.
Does continuation passing style
actually work?
Does this CPS stuff actually work?
Not in Python! because as we discussed earlier, Python doesn't have tail-call
optimization! So no matter how you write it, endlessly calling functions will
just add frames to the stack with each call, and for any reasonably complex
program you'll end up running out of room on the stack.
avoiding stack overflow
A common way to work around this is to use a a trampoline!
avoiding stack overflow
def trampoline(k):
while hasattr(k, '__call__'):
k = k.call()
return k
The idea is that instead of each function calling the continuation, it returns
(I know, oh cheating sadness! - let's say, "bounces back") a function to the
trampoline, which is a loop that calls all of the continuations for us.
Which is all a bit silly, because continuation passing style obviates the need for
a call stack anyways - the point of a stack is to allow you to return to a previous
frame without having changed its context, but here you can never return, so there's
no need to preserve any previous frames
so, what have we accomplished?
Well, we've replaced two perfectly good lines of code
with a mess of incomprehensible gibberish. Win!
I'm being facetious, of course. CPS is pretty cool! Some
compilers (ie Chicken Scheme) use it as an intermediate representation, because it's dead
simple -- do one thing, goto, do another, goto.
Thing is, it's cool for compilers - it's a super clear and precise control flow! - but
pretty terrible for humans - hard to follow, Dijkstra was right (shocking, no?)
Let it go!
(We don't need that stuff, anyway!)
Most of all, though, this showed us that even if we get rid of most of our
programming tools: mutable variables, the stack, and returning from
functions, we can still do everything we want.
Don't Take it for Granted
(thank you!)
@DanielleSucher
dsucher@gmail.com
http://daniellesucher.com
Go ahead, take it for granted! If we lost all this stuff that we rely on in the
languages we use every day, so what? We'd get over it. We could rebuild what
we lost (and make it bigger, stronger, faster! &c)