Don't Take it for Granted

ruminations on the art of translation

@DanielleSucher
dsucher@gmail.com
http://daniellesucher.com

with thanks to

Le Ton Beau de Marot

by Douglas Hofstadter

Union Find

How can we tell if two addresses are

on the same island?

Union Find lets us elect a

random but consistent

representative for each set

Union Find


class DisjointSetItem:
    def __init__(self, payload):
        self.payload = payload
        self.parent = self
        self.rank = 0

    def find(self):
        # ...

    def union(self, other):
        # ...

our island-counting program


items = {}
for node in nodes:
    items[node] = DisjointSetItem(node)

for node in nodes:
    dsi = items[node]
    for edge in node.outgoing_edges:
        end_node_dsi = items[edge.end_node]
        dsi.union(end_node_dsi)

joining two nodes


def union(self, other):
    my_root = self.find()
    their_root = other.find()

    if my_root == their_root:
        return
    elif my_root.rank < their_root.rank:
        my_root.parent = their_root
    elif my_root.rank > their_root.rank:
        their_root.parent = my_root
    else:
        their_root.parent = my_root
        my_root.rank += 1

finding a node's root


def find(self):
    if self.parent != self:
        self.parent = self.parent.find()

    return self.parent

Some of the fancy stuff this uses

classes
recursion
mutable variables
pointers
returning from functions
automatic memory management
if statements

What if we strip some of that stuff away?

What if we couldn't have classes?

Structs will do just fine


struct DisjointSetItem {
    struct DisjointSetItem *parent;
    void *payload;
    int rank;
};

and functions that operate on pointers to structs


struct DisjointSetItem *DSI_find(
    struct DisjointSetItem *item) {

    if (item->parent != item)
        item->parent = DSI_find(item->parent);

    return item->parent;
}

We can even manage inheritance without classes


struct DSISubclass {
    struct DisjointSetItem item;
    int stuff;
};

Let's talk about the stack

the Amsterdam plot

our original find implementation


def find(self):
    if self.parent != self:
        self.parent = self.parent.find()

    return self.parent

find, re-written as a function


def find(item):
    if item.parent != item:
        new_parent = find(item.parent)
        item.parent = new_parent

    return item.parent

memory clobbered, without the stack


def find(item):
    if item.parent != item:
        new_parent = find(item.parent)
        # aw hell, item now points to
        # the original item.parent!
        (item.parent).parent = new_parent

    return (item.parent).parent

find, as a function, de-optimized


def find(item):
    if item.parent == item:
        return item
    else:
        return find(item.parent)

tail recursion can be transformed into a loop


def find(item):
      current = item
      while current.parent != current:
            current = current.parent

      return current.parent

Tail Call Optimization

(some compilers will transform recursive functions into loops for you!)

python won't

“ in many cases... the elimination of stack frames doesn't do anything for the algorithmic complexity of the code, but it does make debugging harder ”

- Guido van Rossum,
http://neopythonic.blogspot.com/2009/04/
final-words-on-tail-calls.html

ruby will, if you configure it to do so


RubyVM::InstructionSequence.compile_option = {
  :tailcall_optimization => true,
  :trace_instruction => false
}

scala can warn you if you get in its way


import scala.annotation.tailrec

@tailrec def find = {
      if (parent == this) {
            this
      } else {
            parent.find
      }
}

back to our original find, as a function


def find(item):
    if item.parent != item:
        item.parent = find(item.parent)

    return item.parent

with path compression, transformed into a loop


def find(item):
      items = [item]

      # find the root
      while items[-1].parent != items[-1]:
            items.append(items[-1].parent)
      root = items[-1]

      # set each node's parent as the root
      for i in items:
            i.parent = root

      return root

But could we do actual recursion without having a stack?

(instead of translating it into a loop)

“ The purpose of a stack in recursion is to create an inverse for any function by keying in time and space. ”

- Otto J. A. Smith,
http://home.olympus.net/~7seas/recurse.html

we could just build our own stack


def find(item):
    return find_recursive(item, [])

def find_recursive(item, children):
    if item.parent != item:
        children.append(item)

        root = find_recursive(item, children)

        item = children.pop()
        item.parent = root

    return item.parent

What if we had no variables?

our original union implementation

(this is all about the side effects!)


def union(self, other):
      my_root = self.find()
      their_root = other.find()

      if my_root == their_root:
            return
      elif my_root.rank < their_root.rank:
            my_root.parent = their_root
      elif my_root.rank > their_root.rank:
            their_root.parent = my_root
      else:
            their_root.parent = my_root
            my_root.rank += 1

an api that accomodates immutable data


class DisjointSetForest:
  def make_set(self, new_item_payload):
    # ...
    return DisjointSetForest(*args)

  def union(self, item_1_id, item_2_id):
    # ...
    return DisjointSetForest(*args)

  def find(self, item_id):
    # ...
    return item_root, DisjointSetForest(*args)

every change returns

an entirely new forest

performance in an immutable world

Shallow Trees

can approximate

Arrays

(like Clojure's persistent vectors!)

Bit-partitioned Tries

copying persistent vectors

(lots of shared memory!)

You can read more about

Clojure's persistent vectors here

http://hypirion.com/musings/
understanding-persistent-vector-pt-1

our immutable items


class DisjointSetItem:
    def __init__(self, payload, parent, rank):
        self.payload = payload
        self.parent = parent or self
        self.rank = rank

losing track of things...

whoops!

the forest can keep an "array" of items

(with indices as their IDs!)

make_set, without variables


def max_id(self):
    len(self.items) - 1

def make_set(self, payload):
    new_item = DisjointSetItem(payload,
                               None), # parent
                               0) # rank
    new_items = self.items + (new_item,)

    return DisjointSetForest(new_items)

union, without variables


def union(self, item_1_id, item_2_id):
    # ...

    if root_1.rank < root_2.rank:
        return replace(item_1_id, root_2)
    elif root_1.rank > root_2.rank:
        return replace(item_2_id, root_1)
    else:
        return replace(item_2_id, root_1, 1)


def replace(self, index, root_index, incr=0):
    existing = self.items[index]

    new_root = DisjointSetItem(
        existing.payload,
        root_index,
        existing.rank + incr)

    new_items = (self.items[:index] +
        (new_root,) + self.items[index + 1:])

    return DisjointSetForest(self.max_id,
                             new_items)

...oh hey, we've invented pointers!

What if we couldn't return from functions?

We'd just go on and on, my friend...


def Constructor(self, some_args, continuation):
    self.var1 = var1
    self.var2 = var2
    continuation(self)

Continuation Passing Style

Goto Considered Harmful

http://www.u.arizona.edu/~rubinson/copyright_violations/
Go_To_Considered_Harmful.html

“ we should... make the correspondence between the program (spread out in text space) and the process (spread out in time) as trivial as possible. ”

- Dijkstra, Goto Considered Harmful

our original island-counting program


items = {}
or node in nodes:
    items[node] = DisjointSetItem(node)

for node in nodes:
    dsi = items[node]
    for edge in node.outgoing_edges:
        end_node_dsi = items[edge.end_node]
        dsi.union(end_node_dsi)

let's just look at the first part...


items = {}
for node in nodes:
    items[node] = DisjointSetItem(node)

without iteration or mutable variables


def process_node(items, nodes, index):
    if index == len(nodes):
        return items

    node = nodes[index]
    items = items.add(node,
                      DisjointSetItem(node))
    return process_node(items,
                        nodes,
                        index + 1)

# process all the nodes:
items = process_nodes({}, nodes, 0)

in continuation passing style


def process_node(items, nodes, index, k):
    if index == len(nodes):
        k(items)

    node = nodes[index]

    def add_and_continue(item):
        add_item(items, node, item, rest)

    def rest(items):
        process_node(items, nodes, index+1, k)

    DisjointSetItem(node, add_and_continue)

when we're done, we just go deeper


def process_node(items, nodes, index, k):
    if index == len(nodes):
        k(items)

    # ...

our first helper function


node = nodes[i]

def add_and_continue(item):
    add_item(items, node, item, rest)

our second helper function, which contains the recursion


def rest(items):
    process_node(items, nodes, index + 1, k)

we pass a node & a continuation to our constructor


    DisjointSetItem(node, add_and_continue)


def process_node(items, nodes, index, k):
    if index == len(nodes):
        k(items)

    # node = nodes[index]

    # def add_and_continue(item):
    #     add_item(items, node, item, rest)

    # def rest(items):
    #     process_node(items, nodes, index+1, k)

    # DisjointSetItem(node, add_and_continue)


def process_node(items, nodes, index, k):
    # if index == len(nodes):
    #     k(items)

    node = nodes[index]

    # def add_and_continue(item):
    #     add_item(items, node, item, rest)

    # def rest(items):
    #     process_node(items, nodes, index+1, k)

    DisjointSetItem(node, add_and_continue)


def process_node(items, nodes, index, k):
    # if index == len(nodes):
    #     k(items)

    # node = nodes[index]

    def add_and_continue(item):
        add_item(items, node, item, rest)

    # def rest(items):
    #     process_node(items, nodes, index+1, k)

    # DisjointSetItem(node, add_and_continue)


def process_node(items, nodes, index, k):
    # if index == len(nodes):
    #     k(items)

    # node = nodes[index]

    # def add_and_continue(item):
    #     add_item(items, node, item, rest)

    def rest(items):
        process_node(items, nodes, index+1, k)

    # DisjointSetItem(node, add_and_continue)


def process_node(items, nodes, index, k):
    if index == len(nodes):
        k(items)

    # node = nodes[index]

    # def add_and_continue(item):
    #     add_item(items, node, item, rest)

    # def rest(items):
    #     process_node(items, nodes, index+1, k)

    # DisjointSetItem(node, add_and_continue)

Does continuation passing style

actually work?

avoiding stack overflow

avoiding stack overflow


def trampoline(k):
    while hasattr(k, '__call__'):
        k = k.call()

    return k

so, what have we accomplished?

Let it go!

(We don't need that stuff, anyway!)

Don't Take it for Granted

(thank you!)

@DanielleSucher
dsucher@gmail.com
http://daniellesucher.com