On the perils of an artificial superintelligent species

When an existential risk induces a state of intolerable anguish it can no longer be ignored. The potential extinction of humankind by the creation of a superior technological species is by now not only feasible, but highly probable, and must be scrutinised in this light. I hope, for the future of humanity, to reignite a contemplation of this basic premise. To do so I hope that many of you take the time to read these words carefully, perhaps sacrificing several minutes otherwise used pointlessly on your smartphone, for by the end of the essay you may well wish to get rid of it.

It takes a good minute, watching Ryan Gosling’s character in Blade Runner 2049 walk through his apartment engaged in small-talk with an offscreen voice, to realise that this voice does not belong to a human being. It is an artificial being with which he converses: no less adequate, physiognomically comparable, but non-biological, mechanical, artificial in the true meaning of the word. It is a being whose code was written by humans; a being not subject to the laws of nature; a being not grounded by biological compounds and restraints as primitive as the necessity for oxygen to breathe. Such artificial beings, bound instead to electrical networks and, to date, the benevolent will of human programmers, give an insight into a future we are heading towards: we are the first biological species populating the face of the earth capable of starting a technological civilisation, and appear determined to do so.

Faced with this prospect, and articulating an existential fear of its possible outcome, one may be dismissed as fearmongering. Few topics have enlivened the imaginations of science fiction authors and film-makers more than an eventual ‘doomsday’: an apocalyptic scenario of robots seizing power and condemning humans to subservience. Sombre scenes with droning ambient music and an absence of natural light unfold before our eyes, yet beneath these superficial tools a light-heartedness seeps through, reassuring us of the story’s impossibility. Elsewhere, the propagation of a potential existential catastrophe is considered a wildly ignorant regression: far from being a new phenomenon, the argument goes, technological advance has been ongoing for centuries, and to believe that I happen to exist at the zenith of positive developments, at peace with the luxuries of cars, mobile phones and washing machines but keen to declare war on future developments in the vein of self-driving cars and household robots, is an impossible position to uphold; it is outright hypocritical.

With these reproaches in mind I must make clear that neither grasp the content of the position to be argued in this essay. On the one side, there is widespread consensus among experts in the field that the day will come when a superintelligent [1] species is created. Many predict that human-level machine intelligence (HLMI) will be possible within this century, with the progress from HLMI to superintelligence being a mere formality, to be realised in a limited number of decades thereafter. [2] Hence none earnestly doubt the overarching direction, despite disagreements on the precise time frame. On the other side, it is indeed possible that we transcended into inimical territory long ago: just as we appreciate the immanent joy of operating a manual car, or reading a hand-written letter, or eating in the company of friends who are physically present, our ancestors may have found exhilaration in hunting, or the necessity of travelling long distances to catch up with loved ones, or the absence of a sensory overload brought about by the creation of electronic screens. This is an indeterminate question, and not one I seek to dissolve. Rather, the concern at hand is the underlying transition towards superintelligence, of which such isolated examples are mere drops in the ocean, yet when considered together convey a frightening trend.

To end this section, consider two guiding principles for technological advancement: first, everything that can be digitised will be digitised. Second, everything which can be automated will be automated. Now, if neither guilty of fear-mongering nor an ignorant regression, one may instead be charged with romanticism — scared of change which is bound to come. For reasons which will shortly become clear this reproach is misguided, too.


‘An existential risk is one that threatens to cause the extinction of Earth-originating intelligent life or to otherwise permanently and drastically destroy its potential for future desirable development […] we can now begin to see the outlines of an argument for fearing that a plausible default outcome of the creation of machine superintelligence is existential catastrophe […] in which humanity quickly becomes extinct.’ [3]

It seems instructive to briefly relay Nick Bostrom’s [4] reasoning for arriving at this conclusion. Of the varying paths to superintelligence, Bostrom anticipates it will first be attained via the artificial intelligence (AI) path, or else through whole brain emulation (machine intelligence created by scanning and closely modelling the computational structure of the human brain). [5] Such a breakthrough would likely be achieved by a single project, rather than several simultaneously, and thus result in a decisive strategic advantage for the one superintelligent agent. Bostrom believes a plausible consequence of such an advantage would be for the project to create a wise-singleton [6] in order to consolidate the dominant position. [7]

This process would take time, but several realistic phases are imaginable that would lead to the establishment of a wise-singleton [8]: 1) a pre-criticality phase, where scientific research continues and the AI agent is gradually capable of doing the research by itself; 2) a recursive self-improvement phase, where the AI ‘becomes better at AI design than the human programmers. Now when the AI agent improves itself, it improves the thing that does the improving. An intelligence explosion results’ [9], and the AI agent focuses on improving its own so-called superpowers, encompassing strategically relevant tasks [10]; 3) a covert preparation phase, in which the AI agent develops a masterplan for the eventual takeover, of a complexity that humans can neither fathom nor foresee; 4) an overt implementation phase, where the ‘AI has gained sufficient strength to obviate the need for secrecy. The AI [agent] can now directly implement its objectives on a full scale.’ [11]

At this point Bostrom turns to potential motivations which a wise-singleton superintelligent agent could have, in order to evaluate whether an existential catastrophe is a veritable threat. Two conclusions can be drawn [12]: first, ‘we cannot blithely assume that a superintelligence will necessarily share any of the final values stereotypically associated with wisdom and intellectual development in humans’. Second, ‘we cannot blithely assume that a superintelligence with [a pre-determined final goal, e.g., calculating the decimals of pi] would limit its activities in such a way as not to infringe on human interests.’ Thus, if the superintelligent agent, for example, happens to require more resources to follow, say, instrumental goals, and these resources include biological compounds such as copper and natural gas, then an existential catastrophe is indeed a highly probable outcome.

Hence there lies an irrefutable danger in the technological advance towards superintelligence. However, for Bostrom (and many others [13]) the answer to this predicament is not to be found in abandonment of further research, or in prohibition — be it of an individual moral nature or else through the coercive power of the state — but rather through a range of potential safety mechanisms, considered adept at solving the ‘control problem’ of a superintelligent species. To name just a few, ideas range from capability control methods such as tripwires or restrictive environments and motivation selection methods such as domestication or direct specification of final goals, to the installation of human values into the system and a fundamental belief that the technological advance will otherwise, should all else fail, unearth required solutions which we do not yet know exist. [14] In the final section I will now turn to this supposed requirement of a solution within the technological sphere, as opposed to a radical discontinuation in research.


There is a telling passage on the penultimate page of Bostrom’s book highlighting the issue at hand. Consider the following:

‘Before the prospect of an intelligence explosion, we humans are like small children playing with a bomb. Such is the mismatch between the power of our plaything and the immaturity of our conduct. Superintelligence is a challenge for which we are not ready now and will not be ready for a long time. We have little idea when the detonation will occur, though if we hold the device to our ear we can hear a faint ticking sound. For a child with an undetonated bomb in its hands, a sensible thing to do would be to put it down gently, quickly back out of the room, and contact the nearest adult. Yet what we have here is not one child but many, each with access to an independent trigger mechanism. The chances that we will all find the sense to put down the dangerous stuff seem almost negligible. Some little idiot is bound to press the ignite button just to see what happens.’ [15]

An unfortunate human tendency surfaces between these lines as a circumstantial dead-end, comparable to a natural disaster: ascertaining an absolute impossibility of influencing or else preventing the situation at hand. This tendency is not restricted to technological advance: it can be found in beliefs such as the refutation of realistic alternatives to capitalism. Where this tendency is misguided is an ignorance towards the man-made [16] nature of our human lives. Economic constructs of growth and productivity; political groups and parties; castes, nations and classes: these are nothing more than conceptualisations, indeed labels superimposed onto humankind. To ascertain that such phenomena are incumbent on Darwinian ‘laws’ of evolution or on other ‘laws’ of nature is to forget the pivotal role of human decisions qua free will. [17] The same can be said for technological advance. A transition towards superintelligence is not written in the stars, so to speak, preconfigured or predetermined as a default stage in the evolution of humanity, or to transcend an anthropocentric mindset: in the evolution of our universe. The chances for the detonation of our plaything, our bomb, are not negligible, for the bomb has not yet been created. Rather, it is an active process which at this very moment is being shaped and altered, and will not be finished for some time to come; it is a problem potentially culminating in an existential catastrophe, the solution to which is not necessarily restricted to safety mechanisms in the technological sphere, but can in fact encompass a radical discontinuation in research.

Here we reach the pivotal point of this essay. If we accept the man-made nature of technological advance, then a consideration of utmost importance emerges: what type of future do we envision? What role does technology, and what role does humankind play in this future? Indeed, some may find joy in metallic visions of fully automated and digitised future societies. They will highlight the safety improvements through infinite networks of information streams; the efficiency of automated transport; the health benefits of perfectly balanced diets based on individual nutritional requirements; the time saved through holographic representations as an alternative to physical travel; and many, many more. In short, this future would excel in a maximisation of productivity and efficiency. For Bostrom the speculation follows ‘that the tardiness and wobbliness of humanity’s progress on many of the “eternal problems” of philosophy are due to the unsuitability of the human cortex for philosophical work’ [18], and hence progress would be imminent once a superintelligence engages in this task.

But for many more I believe that such a community, devoid of its human-constituents and -counterparts, devoid of that which distinguishes humanity from other (intelligent) species, such as our ability to make mistakes and our capacity to love [19], would equate to a life devoid of meaning, and thus constitute a world in which one would not want to live. For there is a fundamental problem in the cold scientific perspective, challenging and negating all forms of humanism: a maximisation of productivity and efficiency will certainly take ‘work’ off our hands, thereby giving us more time. But in a fully automated and digitised future what will we do with this time? Which tasks can we engage in to give meaning to our lives if a superintelligence, which exceeds the cognitive performance of us humans in virtually all domains of interest [20], is around? With the creation of a superintelligent species the role of humanity would be reduced to mere spectating, dependant on the benevolence of a superior species (which hasn’t turned out too well for monkeys) — at the utter mercy of this superintelligence. Thus prior to opening this door, it is surely relevant to ask ourselves if this is the future we desire.

In light of these considerations it should by now be clear why the third charge of romanticism is misguided. When talking of potential change in the context of technological advance it is not merely individualistic considerations — individual preferences on certain stages of development — at stake. We often contemplate change from an ethical standpoint, considering what impact it will have within a human structure. Yet this is different: here it is no longer a change which may alter certain plinths within the structure, but one which may cause a collapse of the whole human structure in itself. Here we are talking about a change which may result in the extinction of humankind. What would now be required is a radical discontinuation in research towards superintelligence, beginning first and foremost with the power of each individual as a consumer — for embedded in each piece of technology is a substantial amount of research, and reciprocal funding, towards these unfavourable ends (be it from AI models in speech recognition, alternative route-finders during heavy traffic, or recommender systems for music and books [21]). Thus we must all question, at the very least and with each new piece of technology that finds its way into our lives, what pitfalls may be attached and to which negative implications it may lead. Only by remaining vigil to the potential perils of an artificial superintelligent species can hope to prevent an existential catastrophe encompassing the extinction of humankind.



[1] Nick Bostrom — ‘Superintelligence’, 2014. Note: as the term superintelligence will be used throughout the essay, it is helpful to define it from the outset. Bostrom offers the following simplified definition: ‘any [artificial] intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest.’ Thus superintelligence is not to be mistaken with the more commonly used term artificial intelligence (AI). AI references a much narrower capacity, being just one path (albeit the most likely) by which a superintelligence is likely to be created. Further, in recognising it as a superintelligent species I am assuming the intellect may take the form of robotic bodies, although it would probably be neither bound nor limited to a specific physical entity. As I be will quoting frequently from Bostrom’s book Superintelligence over the course of this essay, a short introduction seems adequate: Nick Bostrom is a Professor of Philosophy at the University of Oxford, where he is also Director of the Future of Humanity Institute

[2] Ibid. pp. 23-25 and 75. For the voice of other experts, see for example ‘Was macht uns künftig noch einzigartig’, Die Zeit 14/2018 (S. 37-39)

[3] Ibid. pp. 140-141

[4] See footnote 1 above

[5] Nick Bostrom — ‘Superintelligence’, 2014, ch. 2, in particular pp. 27-29, 33-36 and 42-43. AI and whole brain emulation are the ‘direct’ paths to a machine intelligence, in contrast to more ‘indirect’ paths such as biological cognition or networks and organisations where it is primarily the intelligence of humans which is augmented, thereafter capable (in theory) of creating machine superintelligence

[6] Ibid. p. 124: ‘By “singleton” we mean a sufficiently internally coordinated political structure with no external opponents, and by “wise” we mean sufficiently patient and savvy about existential risks to ensure a substantial amount of well-directed concern for the very long-term consequences of the system’s actions.’ Later, Bostrom also discusses the potential for multipolar scenarios with multiple superintelligent agents. I will not dwell on this issue, for regardless whether it be single or multiple superintelligent agents, the existential risk at issue here will remain

[7] Ibid. pp. 95-109

[8] Ibid. pp. 114-118. I relay a shortened version here

[9] Ibid. p.116

[10] Ibid. p. 114. These include the following tasks: intelligence amplification, strategising, social manipulation, hacking, technology research and economic productivity

[11] Ibid. p. 117

[12] Ibid. pp. 140-141

[13] In fact, Bostrom is one of the more sceptical scientists on this issue, constantly forewarning of the existential risk associated with a potential intelligence explosion

[14] Nick Bostrom — ‘Superintelligence’, 2014, ch. 9, in particular pp. 175-176, and pp. 253-255. For the latter point, see the discussion on a preferred date of arrival on pp. 283-286

[15] Ibid. p. 319

[16] This is one of the few examples in the english language where the neutral form is rarely used. Of course I include all sexes in this term

[17] Again, although I realise that the premise of free will is still being heatedly discussed, I am hereby assuming man’s capacity for free will, and hope that science will one day prove this assumption to be true. If this is not the case, then our whole metaphysical structure must be overhauled, anyway, thus I need not worry about the potentially detrimental effects for this essay should this assumption be false

[18] Nick Bostrom — ‘Superintelligence’, 2014, p. 71

[19] For a discussion on varying characteristics which would likely distinguish as us humans in the future, see ‘Was macht uns künftig noch einzigartig’, Die Zeit 14/2018 (S. 37-39). Further, consider the following conclusion by Hubert Dreyfus in a review essay ‘Why Heideggerian AI failed and how fixing it would require making it more Heideggerian’, Philosophical Psychology (Vol. 20, No. 2, April 2007, pp. 247-268): ‘Merleau-Ponty’s and Freeman’s account of how we directly pick up significance and improve our sensitivity to relevance depends on our responding to what is significant for us given our needs, body size, ways of moving, and so forth, not to mention our personal and cultural self-interpretation. Thus, to program Heideggerian AI, we would not only need a model of the brain functioning underlying coupled coping such as Freeman’s, but we would also need — and here’s the rub — a model of our particular way of being embedded and embodied such that what we experience is significant for us in the particular way that it is. That is, we would have to include in our program a model of a body very much like ours with our needs, desires, pleasures, pains, ways of moving, cultural background, etc. If we can’t make our brain model responsive to the significance in the environment as it shows up specifically for human beings, the project of developing an embedded and embodied Heideggerian AI can’t get off the ground… The idea of super-computers containing detailed models of human bodies and brains may seem to make sense in the wild imaginations of a Ray Kurzweil or Bill Joy, but they haven’t a chance of being realized in the real world.’ (p. 265)

[20] See footnote 1 above

[22] Nick Bostrom — ‘Superintelligence’, 2014, pp. 17-18