There is a post at Less Wrong today called “Fixing akrasia: damnation to acausal hell”.
Akrasia means weakness of will. It’s a recurring theme on Less Wrong – it has its own entry at the LW wiki – for two reasons. First, self-improvement is part of the LW culture. Second, there are activities that are overwhelmingly important but which get hardly any social support (e.g. Friendly AI, cryonics), so it requires unusual measures for people to keep at them.
Precommitment is another concept with a special aura in LW-land, coming from its role in decision theory. If an agent can genuinely guarantee in advance, that it will act in a certain way, that affects how other agents will model it, which can in turn have desirable consequences.
The author of today’s post observes that Roko’s Basilisk involves “precommitting” to work at making the future AI, that will punish you if you don’t measure up. But he also observes that human beings cannot presently precommit in the comprehensive, ironclad, absolute way supposed in game-theoretic analyses. His conclusion is that “fixing akrasia” is actually risky, because it will enable the kind of psychologically absolute commitment required for the demonic deal to go through.
At one level, I’m charmed by the drama and the mental acrobatics here. It turns out that one of the great goals of “extreme-rationalist” self-enhancement, actually opens the way to one of its great bogeymen! If this was science fiction, it would be a great plot twist. This is quality conceptual entertainment, just like Roko’s original posts. It’s a worthy extra twist to the whole affair.
At another level, I think it’s crazy to actually be worrying about this. I’m not too worried that someone is worried about it, because reality contains too many bizarre possibilities, and there’s always someone somewhere who freaks out over them. People worry about the multiverse, about determinism, about solipsism, you name it.
But mostly, I was interested to see the basilisk tiptoeing back into the discourse. It is named nowhere in the post, and the author endorses its banishment from public discussion, but the post is nonetheless all about such “risks” of “acausal trade”. It will be interesting to see whether the post is allowed to stand.
The RationalWiki page on the basilisk, and the accompanying talk page, last saw action almost a month ago. And I can see no further references to the basilisk on Twitter in that same period.
Just a few weeks ago, the Less Wrong basilisk was still an obscure concept, notable mostly as a mysterious in-joke at the forum where it first briefly saw light. I considered myself to be moving slowly towards a formal refutation that could be posted here; and then maybe the moderators there would finally permit open discussion.
Today the basilisk is still obscure, but it has definitely escaped control. Twitter tells me there have been 14 tweets about it in the past 24 hours. It’s the top “bookmark” in the latest blog-post by the novelist Warren Ellis. This is all due to the RationalWiki article, “Roko’s basilisk”, a spin-off of the RW article on Less Wrong. I had imagined that the basilisk might get its 15 minutes of subcultural fame, and then go on to become a meme increasingly divorced from its original form; but I didn’t imagine that it would happen this quickly.
There is an increasingly common account of the basilisk, in which fear of the future AI is supposed to be based on straightforward prediction of the future, plus a little self-referential hocus-pocus.
Here is an analogy for this popularized version of the basilisk:
Straightforward prediction of the future: “Climate change is going to kill large numbers of people, so obviously there will be a search for villains to punish. There will be tribunals, and they will be especially harsh on people who knew what was coming, and who could have made a difference, but didn’t.”
Self-referential hocus-pocus: “If that all happens, dear reader, the tribunal will be especially hard on you – unless you devote your life to saving the climate – because unlike most people, you knew that the tribunal and its punishments were coming – because I just told you so.”
The current version of the new RationalWiki page devoted to the basilisk conforms to this pattern. It may also be seen in an October 2011 “explanation” at “Bo News” which was highly upvoted in a /r/skeptic thread at reddit, devoted to the topic of the basilisk: “People will build an evil god-emperor because they know the evil god-emperor will punish anyone who doesn’t help build it, but only if they read this sentence.”
There is a sense in which Roko’s basilisk really does consist of these two parts, a straightforward prediction and a self-referential hocus-pocus. Future AIs are anticipated to exist, because of the general advance of computational and algorithmic power; that’s a straightforward prediction. (Whether they are likely to have the further specific traits ascribed to them in Roko’s scenario is another matter.)
However, the mechanism of the self-referential hocus-pocus (circular logic, self-fulfilling prophecy) is badly understood or not at all understood, by latecomers to the basilisk saga. For example, the commentator at Bo News says “… you work on the AI now because the AI in the future will reward/punish you, which in lesswrong logic means the AI is actually controlling the past (our present) via memes“. Still, at least that commentator grasped that something odd was being asserted about causality, in the basilisk scenario. Nitasha Tiku’s “Faith, Hope, and Singularity” misses this hocus-pocus element entirely – though she can hardly be blamed for missing it, given the cult of secrecy surrounding the basilisk.
There has recently been a small renaissance of basilisk discussion, at Reddit and RationalWiki. The objective of this post is just to point out that the original basilisk was based on a rather specific hocus-pocus, which may be summed up in the words “acausal trade” and “timeless decision theory”. I won’t try to define those terms right away, let alone examine their credibility as concepts; I just want to point out that the popularized basilisk has largely devolved into straightforward fear of punishment by a future AI, whereas the original basilisk was something a lot weirder.
Having put all my words back into public view, I don’t know when I’ll get back to this blog. While the official response to the basilisk has been an embarrassing mistake, the idea emerges from an interesting context. I may eventually want to talk here, about the general issues surrounding “timeless decision theory”; but it’s not a priority. So for now, I’ll finish up with an “open thread” post, where readers can ask questions and otherwise unburden themselves of their thoughts.
For the reader who wishes to understand what this is about, here are all the comments I’ve written about the “basilisk”. Most of them have been “censored”, meaning that they still exist on the site, but the viewing permissions have been altered by a moderator so that only the author (me) can see them. As censorship goes, it’s relatively minor, but this topic just shouldn’t be censored at all, because no-one has anything to fear from it.
The basilisk was introduced in a post in mid-2010, so my first comment was made in the context of discussion of the post (indented words were written by someone else; I am quoting and responding). Later the post was “hidden”, in the way I described above, and that was the beginning of the basilisk saga, which has continued, on and off, for over two years.