AI Ethics, a More Ruthless Consideration

According to self-satisfied legend, medieval European scholars, perhaps short of things to do compared to we ever-occupied moderns, spent countless hours wondering about topics such as, how many angels could simultaneously occupy the head of a pin; the idea being that, if nothing was impossible for God, surely, violating the observable rules of space and temporality should be cosmic child’s play for the deity…but to what extent?

How many angels, oh lord?

Although it’s debatable whether this question actually kept any monks up at night more than, say, wondering where the best beer was, the core idea, that it’s possible to get lost in a maze of interesting, but ultimately pointless inquiries (a category which, in an ancient Buddhist text is labeled, ‘questions that tend not towards edification’) remains eternally relevant.


At this stage in our history, as we stare, dumbfounded, into the barrels of several weapons of capitalism’s making – climate change being the most devastating – the AI endeavor is the computational equivalent of that apocryphal medieval debating topic; we are discussing the ethics of large language models, focusing, understandably, on biased language and power consumption but missing a more pointed ethical question: should these systems exist at all? A more, shall we say, robust ethics would demand that in the face of our complex of global emergencies, tolerance for the use of computational power for games with language cannot be justified.


OPT-175B – A Lesson: Hardware

The company now known as Meta recently announced its creation of a large language model system called OPT-175B. Helpfully, and unlike the not particularly open OpenAI, the announcement was accompanied by the publication of a detailed technical review, which you can read here.

As the paper’s authors promise in the abstract, the document is quite rich in details which, to those unfamiliar with the industry’s terminology and jargon, will likely be off-putting. That’s okay because I read it for you and can distill the results to four main items:

  1. The system consumes almost a thousand NVIDIA game processing units (992 to be exact, not counting the units that had to be replaced because of failure)
  2. These processing units are quite powerful, which enabled the OPT-175B team to use relatively fewer computational resources than what was installed for GPT-3 another, famous (at least in AI circles) language model system
  3. OPT-175B, which drew its text data from online sources, such as that hive of villainy, Reddit, has a tendency to output racist and misogynist insults
  4. Sure, it uses fewer processors but its carbon footprint is still excessive (again, not counting replacements and supply chain)

Here’s an excerpt from the paper:

From this implementation, and from using the latest generation of NVIDIA hardware, we are able to develop OPT-175B using only 1/7th the carbon footprint of GPT-3. 

While this is a significant achievement, the energy cost of creating such a model is still nontrivial, and repeated efforts to replicate a model of this size will only amplify the growing compute footprint of these LLMs.” [highlighting emphasis mine]

https://arxiv.org/pdf/2205.01068.pdf

I cooked up a visual to place this in a fuller context:

Here’s a bit more from the paper about hardware:

We faced a significant number of hardware failures in our compute cluster while training OPT-175B. 

In total, hardware failures contributed to at least 35 manual restarts and the cycling of over 100 hosts over the course of 2 months. 

During manual restarts, the training run was paused, and a series of diagnostics tests were conducted to detect problematic nodes.

Flagged nodes were then cordoned off and training was resumed from the last saved checkpoint. 
Given the difference between the number of hosts cycled out and the number of manual restarts, we estimate 70+ automatic restarts due to hardware failures.”

https://arxiv.org/pdf/2205.01068.pdf

All of which means that, while processing data, there were times, quite a few times, when parts of the system failed, requiring a pause till fixed or routed around (resumed, once the failing elements were replaced).

Let’s pause here to reflect on where we are in the story; a system, whose purpose is to produce plausible strings of text (and, stripped of the obscurants of mathematics, large-scale systems engineering and marketing hype, this is what large language models do) was assembled using a small mountain of computer processors, prone, to a non-trivial extent, to failure.

As pin carrying capacity counting goes, this is rather expensive.

OPT-175B – A Lesson: Bias

Like other LLMs, OPT-175B has a tendency to return hate speech as output. Another excerpt:

Overall, we see that OPT-175B has a higher toxicity rate than either PaLM or Davinci. We also observe that all 3 models have increased likelihood of generating toxic continuations as the toxicity of the prompt increases, which is consistent with the observations of Chowdhery et al. (2022). As with our experiments in hate speech detection, we suspect the inclusion of unmoderated social media texts in the pre-training corpus raises model familiarity with, and therefore propensity to generate and detect, toxic text.” [bold emphasis mine]

https://arxiv.org/pdf/2205.01068.pdf

Unsurprisingly, there’s been a lot of commentary on Twitter (and no doubt, elsewhere) about this toxicity. Indeed, almost the entire focus of ‘ethical’ efforts has been on somehow engineering this tendency away – or perhaps avoiding it altogether via the use of less volatile datasets (and good luck with that as long as Internet data is in the mix!)

This defines ethics as being the task of improving a system’s outputs – a technical activity – and not a consideration of a system as a whole from an ethical standpoint within political economy. Or to put it another way, the ethical task is narrowed to making sure that if I use a service which, on its backend, depends on a language model for its apparent text capability, it won’t in the midst of telling me about good nearby restaurants, hurl insults like a klan member.

OPT-175B – A Lesson: Carbon

Within the paper itself, there is the foundation of an argument against this entire field, as currently pursued:

“...there exists significant compute and carbon cost to reproduce models of this size. While OPT-175B was developed with an estimated carbon emissions footprint (CO2eq) of 75 tons,10 GPT-3 was estimated to use 500 tons, while Gopher required 380 tons. These estimates are not universally reported, and the accounting methodologies for these calculations are also not standardized. In addition, model training is only one component of the over- all carbon footprint of AI systems; we must also consider experimentation and eventual downstream inference cost, all of which contribute to the growing energy footprint of creating large-scale models.”


A More Urgent Form of Ethics

In the fictional history of the far-future world depicted in the novel ‘Dune’ there was an event, the Butlerian Jihad, which decisively swept thinking machines from galactic civilization. This purge was inspired by the interpretation of devices that mimicked thought or possessed the capacity to think as an abomination against nature.

Today, we do not face the challenge of thinking machines and probably never will. What we do face however, is an urgent need to, at long last, take climate change seriously. How should this reorientation towards soberness alter our understanding of the role of computation?

I think that, in face of an ever-shortening amount of time to address climate change in an organized fashion, the continuation, to say nothing of expansion of this industrial level consumption of resources, computing power, talent and the corresponding carbon footprint is ethically and morally unacceptable.

At this late hour, the ethical position isn’t to call for, or work towards better use of these massive systems; it’s to demand they be halted and the computational capacity re-purposed for more pressing issues.  We can no longer afford to wonder how many angels we can get to dance on pins.

Attack Mannequins: AI as Propaganda

What follows is a sketch, the foundation of a propaganda model, focused on what I’ll call the ‘AI Industrial Complex‘. By the term AI Industrial Complex, (AIIC) I mean the combination of technological capacity (or the lack thereof) with marketing promotion, media hype and capitalist activity that seeks to diminish the value of human labor and talent. I use this definition to make a distinction between the work of researchers and practical technologists and the efforts of the ownership class to promote an idea: that machine cognition is now, or soon will be, superior to human capabilities. The relentless promotion of this idea should be considered a propaganda campaign.

If There’s No AI, What is Being Promoted?

It’s my position there is no existing technology that can be called ‘artificial intelligence’ (how can we engineer a thing we haven’t yet decisively defined?) and that, at the most sophisticated levels of government and industry, the actually existing limitations of what is essentially pattern matching, empowered by (for now) abundant storage and computational power, are very well understood. The existence of university departments and corporate divisions dedicated to ‘AI’ does not mean AI exists; it’s evidence there’s powerful memetic value attached to using the term, which has been aspirational since it was coined by computer scientist John McCarthy in 1956. Once we filter for hype inspired by Silicon Valley hustling (the endless quest to attract investment capital and gullible customers) we are left with promotion intended to shape common perception about what’s possible with computer power. 

As an example, consider the case of computer scientist Geoffrey Hinton’s 2016 declaration that “we should stop training radiologists now” Since then, extensive research has shown this to have been premature, to say the least (see “Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy“).

It’s tempting to see this as a temporarily embarrassing bit of overreach by an enthusiastic field luminary – yet another example of familiar hype but let’s go deeper and ask questions about the political economy underpinning this messaging excess.

Hinton on Radiology in 2016

Radiologists are expensive and, in the US, very much in demand (indeed, there’s a shortage of qualified people). Labor shortages typically lead to higher wages and better working conditions and form the material conditions that create what some call labor aristocracies. In the past, such shortages were addressed via pushes for training and incentives to workers (such as the lavish perks that were common in the earlier decades of the tech era).

If this situation could be bypassed via the use of automation, that would devalue the skilled labor performed by radiologists, solving the shortage problem while increasing the power of owners over the remaining staff.

The promotion of the idea of automated radiology – regardless of actually existing capabilities – is attractive to the ownership class because it holds the promise of weakening labor’s power and increasing – via workforce cost reduction and greater scalability – profitability. I say promotion, because there is a large gap between what algorithmic systems are marketed as being capable of, and reality. This gap, which, as I stated earlier is well understood by the most sophisticated individuals in government and industry, is unimportant to the larger goal of convincing the general population their work efforts can be replaced by machines. The most important outcome isn’t thinking machines (which seems to be a remote goal if possible at all) but a demoralized population, subjected to a maze of crude automated systems which are described as being better than the people forced to navigate life through these systems.

A Factor Among Factors

Technological systems – and the concepts attached to them – emerge from, and reflect the properties of the societies that create those systems. Using the Hegelian (and later, Marxist) philosophy of internal relations, we can analyze both real algorithmic systems and the concept of ‘AI’ as being a part of the interplay of factors that comprise global capitalist dynamics – both actor and acted upon. From this point of view, the propaganda effort promoting ‘AI’ should not be considered in isolation, but as one aspect of a complex.

Hype vs. Propaganda

What defines hype and what differentiates standard industry hype from a propaganda campaign?

Hype (such as marketing material that makes excessive claims – for example, AI reading emotions) is narrowly designed to attract investment capital and customers. Hype should be considered a species of advertisement. Propaganda has a broader aim, which is described by Jacques Ellul in his work, Propaganda.

Describing one of the four elements of propaganda, and bridging from advertising to propaganda, Ellul writes…

Public and human relations: These must necessarily be included in propaganda. This statement may shock some readers, but we shall show that these activities are propaganda because they seek to adapt the individual to a society, to a living standard, to an activity. They serve to make him conform, which is the aim of all propaganda. In propaganda we find techniques of psychological influence combined with techniques of organization and the envelopment of people with the intention of sparking action.”

A Propaganda Model: Foundational Concepts

As the model of AI as propaganda is constructed, the works of three thinkers will provide key guidance:

Jacques Ellul: Propaganda

As already noted, Ellul’s key work on propaganda (which, I think, was the first to apply sociology and psychology to the topic) is a critical source of inspiration:

“Propaganda is first and foremost concerned with influencing an individual psychologically by creating convictions and compliance through imperceptible techniques that are effective only by continuous repetition. Propaganda employs encirclement on the individual by trying to surround man by all possible routes, in the realm of feelings as well as ideas, by playing on his will or his needs through his conscious and his unconscious, and by assailing him in both his private and his public life.

The propagandist also acknowledges the most favorable moment to influence man is when an individual is caught up in the masses. Propaganda must be total in that utilizes all forms of media to draw the individual into the net of propaganda. Propaganda is designed to be continuous within the individual’s life by filling the citizen’s entire day. It is based on slow constant impregnation that functions over a long period of time exceeding the individual’s capacities for attention or adaptation and thus his capabilities of resistance”

Full at Wikipedia’s article 

The relentless promotion of the idea that automation is on the verge of replacing human labor can be interpreted as being part of an effort to create a conviction (there is artificial intelligence’, it cannot be stopped) and compliance (resistance to ‘AI’ is retrogressive Luddism).

Noam Chomsky/Edward S. Herman: The Propaganda Model

In their book, ‘Manufacturing Consent’ Chomsky and Herman present a model of propaganda via media:

“The third of Herman and Chomsky’s five filters relates to the sourcing of mass media news: 

The mass media are drawn into a symbiotic relationship with powerful sources of information by economic necessity and reciprocity of interest. Even large media corporations such as the BBC cannot afford to place reporters everywhere. They concentrate their resources where news stories are likely to happen: the White House, the Pentagon, 10 Downing Street and other central news “terminals”. Although British newspapers may occasionally complain about the “spin-doctoring” of New Labour, for example, they are dependent upon the pronouncements of “the Prime Minister’s personal spokesperson” for government news. Business corporations and trade organizations are also trusted sources of stories considered newsworthy. Editors and journalists who offend these powerful news sources, perhaps by questioning the veracity or bias of the furnished material, can be threatened with the denial of access to their media life-blood – fresh news. Thus, the media has become reluctant to run articles that will harm corporate interests that provide them with the resources that they depend upon. 

The dependence of news organizations on press releases from Google and other tech giants that promote the idea of ‘AI’ can be interpreted as being an example of the ‘symbiotic relationship, based on reciprocity of interest’ Chomsky and Herman detail.

Full at Wikipedia’s article

Summary

The concept of “artificial intelligence” is aspirational (like ‘warp drive’) and does not describe any existing or likely to exist computational system. Despite this, the concept is promoted to attract investment capital and customers but also, more critically for my purposes, devalue the power of labor – if not in fact than in perception (which, in turn, becomes fact). For this reason, I assert that ‘AI’, as a concept, is part of a propaganda campaign.

Key Characteristics of AI Propaganda

The promotion of the concept of AI, as a propaganda effort, has several elements:

* Techno-optimism: The creation of thinking machines is promoted as being possible, with little or no acknowledgement of limitations.

* Techno-determinism: The creation of thinking machines is promoted as being inevitable and beyond human intervention, like a force of nature

* An Elite Project: Although individual boosters, grifters, techno enthusiasts and practitioners may contribute within their circles (for ex. social media) to hype, the propaganda campaign is an elite project designed to effect political economy and the balance of power between labor and capital.

* Built on, but not limited to, hype: There is a relationship between hype and propaganda. Hype is of utility to the propaganda campaign but the objective of that campaign is broader and targeted towards changing societal attitudes and norms.

I use the term attack mannequins to describe this complex – lifeless things, presented as being lifelike, used to assault the position and power of ordinary people.


UPDATE: 2 NOVEMBER 2021

In this video, YouTube Essayist Tom Nicholas details the efforts Waymo has made to convince people – via the use of YouTube ‘educators’ – that autonomous vehicles are a perfected technology, superior to human drivers and a solution to traffic safety and congestion issues.

Nicholas makes the point that inasmuch as the Waymo ‘autonomous’ taxi service (supported by a large staff of people behind the scenes) only operates in a subsection of the suburbs of Phoenix, Arizona USA, the PR campaign’s goal can’t be explained as advertising; it’s part of a broad effort to change minds.

In other words, propaganda.