Fast weak key dict

What follows is performance trick to make cache dictionaries as fast as possible. This addresses a specific case when you create an “addon” or proxy objects and you want at most one such proxy per object. (Another common use is attaching some metadata to the object)

The way to accomplish this is to maintain the object → proxy registry and create new proxies on-demand. However, such a registry end up holding a reference to every object used. This is really bad, because it’s pretty much a memory leak that a garbage collector cannot fix.

To fix that one can use weakref.WeakKeyDict. However, that class is much slower that the builtin dict, so sometimes it’s worthwhile to do the following instead:

from weakref import ref
reg = {}
def make_proxy(ob):
    try:
        return reg[ref(ob)]
    except KeyError:
        return reg.setdefault(ref(ob, reg.pop), proxy(ob))

What makes it work are two things, one is that the weakrefs accept a callback that will get called when the object is destroyed. The argument to the callback is the weakref, so what the ref(ob, reg.pop) does is create a reference that (provided it’s used as a key in reg dictionary) will remove itself from that dict once the object it refers to is collected. See, the reg.pop is the callback and it will receive the reference we constructed. It’s not obvious, so take a minute to figure it out, if necessary.

The other thing is that weak references are compared (and hashed) based on the object they reference, the callbacks are not compared. What that means is that ref(ob) == ref(ob, reg.pop), so the keys in the try and the except blocks are equal.

One last thing to remember, is that if proxy holds a reference to ob, that reference should be weak, otherwise there’s a non-collectable reference cycle.

You can also create a similar pattern analogous to WeakValueDict, depending on how the proxies and wrapped objects are used and need to be garbage collected.

For some additional discussion see original Google+ post.