Toronto LW Singularity Discussion, 2012-03-09

Present: SB, SD, GE, SF, EJ

Minutes: GE (note that when I’ve written something down that doesn’t make sense to me afterwards, I’m sometimes leaving it out here. Please let me know if you think I’ve missed out an important point)

The starting point of the discussion was Yudkowsky’s notion of an optimisation process, in particular the posts Optimization and the Singularity and Observing Optimization.

Here are my notes from beforehand:

Yudkowskys’s view:

My view (actually I didn’t get to communicate this in the meeting, sorry):

  • Thermodynamics – to hit a small target, must start from smaller platform?
  • We are limited in the kinds of property we can expect from all optimisation processes in general. When it comes to programmable optimisation processes, someone could write one to do the exact opposite of what we expect (e.g. the opposite of hitting a small target would be an entropy-maximiser)
  • instead define in terms of who beats who, or its success in a wide range of environments?


  • optimisation process
  • optimisation power
  • intelligence as optimisation power divided by resource usage
  • recursively self-improving optimisation process
  • which features of self an optimisation process can improve, and how quickly
  • goal stability
  • friendliness
  • “Friendly AI” as a particular approach to friendliness
  • coherent extrapolated volition
  • singleton
  • programmable optimisation process (AGI may be programmable, evolution not)
  • meme (actually I’m interested in a generalised notion – any kind of information that undergoes copying, mutation and selection. Genes would be included here).

SD starts by mentioning that the Journal of Consciousness Studies has published a bunch of stuff on the singularity. This obviously provides material for our group to digest. SD has also read Chalmers(2010). GE has read it too! And copied down some notes (for the opinion database) which he forgot to tell everyone. Since the database isn’t up and running yet, I’ve put them on my blog for now.

SD thinks that Chalmers(2010) is a good intro to the Singularity as it’s all there in one document rather than . GE says that it’s encouraging to see a certain amount of agreement from someone somewhat outside the Singularity Institute/Future of Humanity Institute bubble – possible evidence that it’s a sanity cluster rather than a craziness cluster?

SD says that some of the Journal of Consciousness singularity material is interesting and some is ridiculous. GE says we must be careful when drawing our “ridiculousness” boundary. (If we’re too conservative then we end up only admitting material which already agrees with our worldview. If we’re too liberal then we end up wasting a lot of time on nonsense).

GE goes over the concepts listed above.

Intelligence (Yudkowsky’s definition)

GE remembers Yudkowsky defining intelligence as optimisation power divided by resources used (actually I got this from Muelhauser/Salamon – Intelligence Explosion: Evidence and Import. They say it comes from “Yudkowsky (2008b)” but I didn’t manage to find the exact reference). The idea is that in general, you can give a system greater optimisation power by giving it more resources. But the amount of optimisation power it can have for a given amount of resource input is constrained by this intangible thing called “intelligence”, which in turn is constrained by the level of technological development.

Hitting a small target

GE says there are two different kinds of small target here:

  • Only a very small patch of design space consists of “optimisation processes”. An even smaller patch consists of “friendly” ones.
  • The optimisation process itself must try and hit a very small patch of possible-world space in order to achieve its goal.

These are related in the case of a recursively self-improving optimisation process. It will use its ability to hit very small targets in order to hit the very small target of creating an improved version of itself.

Goal stability

SD asks whether Friendly/Unfriendly is a judgement relative to us. Would an AI have its own distinct notion of “friendliness”? If it created an “unfriendly” AI (according to its own values), would it optimise itself out of existence? GE thinks that the terms Friendly/Unfriendly are generally used with respect to human values only. When thinking about AIs that might create other AIs with different values, we instead talk about “goal stability”.

GE thinks that AIs with unstable goals are prone to self-modification or to creating AIs which both have different goals and different levels of goal stability. If you iterate this process you’d expect to jump around goal/stability space until you end up with high goal stability. Of course there’s a third dimension to this space which is optimisation power. GE isn’t sure whether it makes sense for an AI to have a large amount of optimisation power but a small amount goal stability.

SD asks whether it’s important in AI design to protect the “goal” against corruption, e.g. placing it in read-only memory. GE thinks that there’s the danger that you label some part of memory as the “goal”, but an agent then emerges which uses a completely different part of memory to store its actual goal. GE also says that an agent will naturally act to protect its own goal anyway.

AI containment

SD brings up the “Riemann Hypothesis catastrophe” – an AI programmed with the explicit goal of proving some difficult mathematical theorem might take over the world in order to acquire computational resources with which to do it. (This seems to be attributed to Marvin Minsky but I can’t find the original source). SF says that’s essentially the Friendly AI problem.

GE says that ensuring safety by placing restrictions on an AI’s action is somewhat dodgy due to the “AI box” problem.

EJ says we have to define what we mean by an “action”.

GE agrees and describes an AI attached directly to an arm, and another AI attached to a computer screen which has a human eye looking at it and the eye is connected via a brain to another arm. The hypothesis is that if the AI is powerful enough, these two setups look pretty much the same from the AI’s point of view, it’s just that in the second one the protocol is more complicated.

SD mentions Yudkowksy’s AI box experiment. (This succeeded on the first three trials and failed on the last two, but we don’t have any of the transcripts so the evidence is somewhat weak).

GE says that even if you could keep an AI in the box, there’s still the possibility of a source code leak, which another team could use to develop a less well-constrained AI. SB points out that even without a source code leak, the knowledge and technology required for AI is still there. GE agrees and says that after one AI is invented, it won’t be long before the next one is. The first may be safely boxed and the second not.

SF asks whether this applies to Friendly AI too? SB says that a (singleton) Friendly AI could prevent further development of AI, e.g. by seizing information on AI design.

Since this doesn’t sound like a particularly “friendly” move, it sparked off a discussion of values.


EJ asks about the distinction between manipulation and persuasion. What is acceptable manipulation? SB says that one of Yudkowsky’s laws of Fun Theory is not being ruled by a singleton.

SB brings up Coherent Extrapolated Volition – extrapolate what people’s volition would be if they had all relevant knowledge. EJ points out that people with the same knowledge won’t always come to the same conclusions (in terms of preference). Given enough information, would all people decide to eat healthily?

EJ also brings up morality vs. personal preference. SD says that he in general he labels his preferences as “moral” if they affect other people.

SF brings up terminal vs. nonterminal values. SB gives the example of giving medicine to someone who’s sick – if you find out the medicine doesn’t work, you stop giving the medicine because the terminal value is removing the sickness, not administering the medicine (same link). The difference is that instrumental values can change with new information.

SF says the hope is that terminal values cohere to some extent. SD says that we can sacrifice terminal values in order to achieve more important terminal values.

Anthropic principle

GE mentions Boltzmann brains and the anthropic principle – I honestly can’t remember why. People seemed interested though so I may do a couple of idea posts about that.

Yudkowsky’s narrative

SD relates history as a series of improvements in optimisation (I think this is taken from Yudkowsky’s side in the AI foom debate, e.g. The First World Takeover and Observing Optimization and Optimization and the Singularity).

  • Dead world
  • Life – search
  • Sex
  • Brains – directed search
  • Humans

EJ asks what counts as a brain (e.g. does C. elegans(wp) have one)? GE says what we’re most interested in is goal-directed behaviour. Whether we regard behaviour as goal-directed or just reactive is really a statement about our state of knowledge rather than the world itself. If we’re much smarter than something then we can predict its next move, if we’re not then we can’t but we might be able to predict what state it will leave the world in. (e.g. the chess player example. I was thinking of Yudkowsky’s Belief in intelligence here).

SD points out that some optimizations have long delays (e.g. farming causing writing).

SD points out an interesting point of contention between Yudkowsky and Hanson: Yudkowsky appears to rely on narrative construction, Hanson on established economic models.

Em world

SD brings up Hanson’s “em world” (i.e. an economy dominated by brain emulations) and Darwinism. GE points out that the unit of selection is brain patterns not genomes. SD says “plan as if Hanson’s vision is true, insure against EY being right”. (I didn’t quite get what that meant, sorry). SD also mentions Goertzel’s Nanny AI.

SF finds Hanson’s vision implausible, and asks everyone for a “Hanson vs. Yudkowsky” opinion. (We do get around to everyone, but it sparks off a lot of other discussions in the meantime).

SD isn’t sure but wants Yudkowsky to be right (I assume because Hanson’s vision seems somewhat dystopian, but Yudkowsky’s vision at least allows the possibility of constructing a utopia). SD says that Hanson doesn’t expect a singleton.

GE brings up kin selection(wp) – we would expect ems to cooperate with their clones. Would the em world turn into some kind of superorganism? What would its goals be? Would such a world become stable?

EJ wonders whether more intelligence entities would displace the old ones? SD wonders whether the differential is finer grained preferences?

SD points out that if ems invent space travel then it will change things quite a bit. SF points out that this is a lot easier with ems than with humans, since they can be implemented on lighter and more robust hardware.

GE wonders whether the world will become populated with clones of the most successful em (i.e. whichever em is able and motivated to make copies of itself across the whole world). This em would presumably stop competitors from appearing after it had been established. But the copies need not be exact, and within that allowed space there might be a mind design that wants more exact copies of itself to take over the world. There would therefore be a trend towards greater uniformity.

SF thinks Hanson’s vision is implausible – there’s a big change and yet the basic rules of how the economy works are staying the same. In terms of timescale, SF thinks ems may come first. SB agrees with SF on these points.

Em takeoff

GE lists various ways ems can make themselves more economically efficient:

  • Creating a team of copies with the same goal – better coordination than any human social group
  • Creating better brain-to-brain communications within that team, creating a kind of supermind
  • Improving the hardware they run on
  • Removing parts of themselves that are not useful economically
  • Replacing parts that are modelled as neurons with equivalent (but much simpler) algorithms
  • Take advantage of backups in order to allow a lot more experimentation in mind design than would be possible with human brains

SF thinks slow takeoff is more likely starting from ems than from AI. SD agrees and says that because less deep understanding is required for ems than for AI, we’d expect ems to be in less of a position to massively improve themselves.

EJ asks what we mean by fast/slow takeoff. In terms of technological progress, we are already having to adjust to things happening faster. GE agrees and thinks we should try to put numbers to timescales.

GE thinks that takeoff would start with the team with the loosest ethics (if you look at my list, it’s not all particularly ethical).

Em self-copying drive

SD observes that we are modelllng ems as aliens. It won’t be like that, at least not initially – they start off as faithful copies of human minds. GE frames it in terms of the hierarchy of needs – will basic survival dominate?

SD thinks there will be no copier initially. Darwinian struggle takes over as costs go down? GE points out that the cost of scanning only stops new brains being turned into ems, it doesn’t stop copying. SD agrees and thinks this gives the initial ems an early adopter advantage.

EJ asks where this drive to self-copy would come from. Does sex drive translate into self-copying drive? SB says it only takes one em to start it off. SD imagines ems seeing each other make copies and thinking they’d better join in. Also, the “ems will make lots of copies of themselves” meme is already out there. SB imagines companies hiring ems and making copies. EJ says that if you’re an em and prove useful, people will make copies of you.

GE’s opinion is that the “drive to self-copy” is based on evolutionary arguments not psychological ones – there are various plausible psychological pathways to getting there, but once we do we’re stuck with it. GE wonders how many ems you need to have before one becomes a self-copier?

Robots or virtual worlds

SF asks whether ems need humans or robots to do their real-world work? EJ asks whether we’re imagining ems embodied in robots or living in virtual worlds? SD points out that even a computer is embodied in reality.

Would you survive in em world?

SD doesn’t think that he would be well-suited to em world. EJ isn’t sure that SD can be so confident.

GE lists reasons you might expect to see non-optimised entities in spite of evolutionary pressures:

  • Variation – sometimes you hit the wrong combination of genes and environment and create a dud
  • Relative mediocrity – we are so used to seeing humans in terms of how they are different from other humans that we forget we’re all pretty awesome
  • Frozen design flaws/local optima
  • In non-equilibrium conditions, being more optimised to the ancestral environment than the current one


EJ points out that we might expect ems to be specialised to different niches.

GE imagines degenerate ems – some jobs may performed more efficiently by a sub-human intelligence (but still be filled by an em-descendent rather than traditional software). SB says they might lose consciousness in the process.

Miscellaneous thoughts on em future

SB asks how different em world would be to a future without AI. GE isn’t sure how to answer the question – exactly which of our assumptions that imply AI would we be unwinding? (Actually I think I was being unfair here – presumably at least a certain portion of our future probability pie doesn’t contain any AI, and we can ask what that slice of the pie looks like).

SD upgrades his optimism on em future. We would expect ems to be happy in their jobs? EJ says it hinges on whether they will be conscious. We already know a lot about motivation and reward in the brain.

SB says there’s a big difference between the initial em world and how it ends up.


EJ points out that humans have drives other than reproduction. SD says that differential reproduction is still what evolution optimises for. GE says that evolution is always happening on some level (genes or memes or brain patterns or some kind of information).

SD says that memes appear to have no stable underlying structure. Are we thinking of them as neurological patterns? We need a non-bullshit definition of a meme.

GE says that extrapolating a future where humans have lots of kids is to assume that genes win. SD agrees and wonders whether the demographic transition(wp) can be thought of as a transition to memes.

EJ wonders whether the same neural firing in different people leads to the same ideas. SB says it can be the same meme despite differences. SD says there needs to be some kind of correlation algorithm.

EJ says that people find it harder to distinguish yellow from orange if their language doesn’t have separate words for them. GE says there are two possible models for this: either we map visual inputs directly to language, or we map both to some kind of “concept” space, where two distinct concepts are more likely to emerge if two different words are used.

EJ asks if brain ems copy and diverge, are they still the same brain?

(This didn’t come up in the discussion, but I want to write it down before I forget… if we have lots of brain em copies then we might expect memes to spread between them much faster than they do between humans, since the brain architectures correlate almost exactly).

Mind and body

GE imagines someone disconnecting your body from your brain’s motor neurons and wiring it instead to some algorithm that vaguely approximated your behaviour. How long would it be before you noticed?

EJ mentions phantom limb pain(wp) and says that the trick is to convince the brain that it has a limb and it’s that it’s not hurting. This reminds SB of the rubber hand illusion(wp).


0 Responses to “Toronto LW Singularity Discussion, 2012-03-09”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: