Real vs. Fictional: Can Reality Survive AI?
Where I discuss how AI implodes the distinction between the real and the fictional
Thanks again to all my readers! Appreciate you. ❤️
And to you newbies, you’ve matched the torn pieces of the map, X marks this spot! 🏴☠️
This substack, LawDroid Manifesto, is here to keep you in the loop about the intersection of AI and the law. Please share this article with your friends and colleagues and remember to tell me what you think in the comments below.
I want to dig a little deeper into how AI, and especially generative AI, is changing the nature of reality right before our eyes. Cloned voices, cloned faces, cloned identities. It’s no longer theoretical, but real this time. Or is it? What is real? What is fictional? And, does the way we determine “real” have to change?
If this sounds like the red pill you’d like to swallow, read on…
Is it Real or Is It Memorex?
On March 29, 2024, OpenAI announced that an improvement to its Voice Engine made it now possible to use a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker.1
Let me unpack that.
Natural sounding speech: until now, synthesized speech or text-to-speech (TTS) has been able to create realistic sounding human voices. But, those voices have been lacking in emotive force, unable to capture the nuance of emotion that varies across words and phrases that we, as humans, imbue into our conversation.2
15-second audio sample: until now, voice cloning required 30 minutes or more of sample audio in order to have enough data to train a voice simulation.3 Sometimes size does matter, and, in this case, shrinking the audio sample required to 15 seconds unlocks many potential use cases.
But, you be the judge, can you tell the difference between the real and fake voice?
Real Voice
Fake Voice
On April 16, 2024, Microsoft introduced VASA, a framework for generating lifelike talking faces of virtual characters with appealing visual affective skills (VAS), given a single static image and a speech audio clip.4 “VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with the audio, but also capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness.”5
As they say, seeing is believing, so take a look:
Simulacrum
Let me unpack that.
Visual affective skills: until now, video avatars have been capable of lip syncing to audio to mimic how we talk, but with a blunted affect.6 Such video avatars appear more like lips synced marionettes where the words spoken often do not match the emotional expressiveness of the face and vice versa.
A single static image: until now, avatar cloning required 4-minutes of talking footage, a green screen and professional lighting and 5-7 days of processing time.7 When created from a single static image, video avatars have not had the range of motion or emotional expression matching that of the words spoken.
Now, we not only have naturally emotive voice to contend with, that can be created from a short audio sample, but we also have high fidelity video and a naturally emotive face created from a single static image, all of which can be created in the time it takes to order a cappuccino.
Both companies claim to be taking a cautious approach. OpenAI noted in its release: “[W]e are taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse. We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities.” Microsoft was also quick to point out: “This is only a research demonstration and there's no product or API release plan.”
As it is easy to imagine the immediate nefarious uses of either of the aforementioned technologies, it would be understandable if one were to find little comfort in these caveats, especially when one considers the relentless pace of progress, the financial incentives for being first to market, and leaks of new models.8
What Could Go Wrong?
Almost as if prompted by my writing this article about the subject of real versus fiction, the following popped into my newsfeed last night:
Ex-athletic director accused of framing principal with AI arrested at airport with gun9
Baltimore County Police arrested Pikesville High School’s former athletic director Thursday morning and charged him with crimes related to the alleged use of artificial intelligence to impersonate Principal Eric Eiswert, leading the public to believe Eiswert made racist and antisemitic comments behind closed doors.
Dazhon Darien, 31, was apprehended as he attempted to board a flight to Houston at BWI Airport, Baltimore County Police Chief Robert McCullough said at a news conference Thursday afternoon... Darien was charged with disrupting school activities after investigators determined he faked Eiswert’s voice and circulated the audio on social media[.]
“The audio clip ... had profound repercussions,” police wrote in charging documents. “It not only led to Eiswert’s temporary removal from the school but also triggered a wave of hate-filled messages on social media and numerous calls to the school. The recording also caused significant disruptions for the PHS staff and students.”
Police say Darien made the recording in retaliation after Eiswert initiated an investigation into improper payments he made to a school athletics coach who was also his roommate.
Human beings find inventive ways to problem solve, using tools (AI or otherwise) in ways unintended to the designer of the tool. It is incumbent on the tool’s creator to anticipate such misuse and to attempt to design against it. But, there are limits. It is impossible to anticipate and prevent every misuse and malfeasance. A knife can be used to intentionally injure. A car can be used to maim. A gun can be used to murder. Even with moral codes and laws written to proscribe behavior that is unacceptable, the unthinkable still happens. Even with internal and external control methods, human ingenuity, like a dandelion growing through a crack in the sidewalk, finds a way.
Our first instinct may be to work the problem: How do we detect the fake recording? What are the telltale signs of AI-generated evidence? Can we watermark AI-generated content to identify it? AI experts can be used to tell fact from fiction.
For example, in the Baltimore case above, “Experts in detecting audio and video fakes [said] that there was overwhelming evidence the voice is AI-generated. They noted its flat tone, unusually clean background sounds and lack of consistent breathing sounds or pauses as hallmarks of AI. They also ran the audio through several different AI-detection techniques, which consistently concluded it was a fake, though they could not be 100% sure.” Though they could not be 100% sure.
How long will it take to erase those signs? When will the imitation become indistinguishable from the imitated? If the past year and a half is any indication, I would say it’s likely already happened. We have slipped off the slippery slope and, like Wile E. Coyote, just don’t know it yet.
The Desert of the Real
I think it’s a natural reaction to become numb to the daily barrage of ecstatic announcements of unprecedented, technological revolution. Even for a tech progressive like me, the change is overwhelming because it implies more than incremental progress, but a discontinuity between the world as it has always been and a new world that dreams things that never were.
⚠️ [Waxing philosophical Alert]: What of this new and undiscovered country?
Jean Baudrillard explored this notion of real versus fictional in his seminal book, "Simulacra and Simulation," published in 1981. Baudrillard's view was that, in a postmodern culture, reality has been so thoroughly saturated by simulacra—imitations that have either no original or no relation to reality—that it has been transformed into a barren wasteland, devoid of the rich and unpredictable textures of genuine life, something he called “the desert of the real.”
The "real" as we understand it is no longer tied to natural, observable phenomena but is instead a construct, heavily mediated by technology.
Abstraction today is no longer that of the map, the double, the mirror or the concept. Simulation is no longer that of a territory, a referential being or a substance. It is the generation by models of a real without origin or reality: a hyperreal.10
This post-modernist perspective was anticipated by Nietzsche:
Truths are illusions which we have forgotten are illusions — they are metaphors that have become worn out and have been drained of sensuous force, coins which have lost their pictures and now matter only as metal, no longer as coins.11
Baudrillard’s view is not, as interpreted by the Wachoswkis in the Matrix movies, that there is a choice between the blue pill and the red pill: a choice between real and fictional worlds. But, rather, that the Matrix (to use that metaphor) has so supplanted the real world that it is Matrixes (Matrices?), like turtles, all the way down.
To complete the thought, Phillip K. Dick had this to say:
I consider that the matter of defining what is real — that is a serious topic, even a vital topic. And in there somewhere is the other topic, the definition of the authentic human. Because the bombardment of pseudo-realities begins to produce inauthentic humans very quickly, spurious humans — as fake as the data pressing at them from all sides… Fake realities will create fake humans. Or, fake humans will generate fake realities and then sell them to other humans, turning them, eventually, into forgeries of themselves. So we wind up with fake humans inventing fake realities and then peddling them to other fake humans. It is just a very large version of Disneyland.12
Although this discussion might seem grandiose and academic, I believe this kvetching is long overdue. In the breathless analysis of the latest AI developments, we need to make space to ponder what it all means.
What Does It Mean to Be Real?
I have to admit that I’m a little stumped. I previously wrote about how generative AI is changing the nature of authorship. There, I suggested redefining the concept of authorship to fit the new AI paradigm. Something similar may be in order here.
Certainly, even after generative content has saturated the interwebs, we will continue to wake up in the morning, have a cup of coffee, walk the dog, have conversations, work on something, worry about something, break bread with our loved ones, and rest at the end of the day. That won’t change.
We will still have physics, biology and chemistry to describe the rules of how the world works. Water will still be made by the combination of two hydrogen atoms with one oxygen atom, the Earth will still orbit around the sun, and humans will still be made from a unique combination of DNA. That won’t change.
What will change is that all media (text, images, video, and audio) that we use to communicate will become unauthenticatable (if that’s not a word, it will be). Our relationship with information itself must evolve. Trust may no longer hinge on verifiability but on the value and impact of the content itself. In the future, the 'realness' of media may come to depend on the connections and meaning it fosters rather than the authenticity of its source.
To quote from the Velveteen Rabbit:
"Real isn't how you are made," said the Skin Horse. "It's a thing that happens to you. When a child loves you for a long, long time, not just to play with, but REALLY loves you, then you become Real."13
Thus, as we move into a world thick with generated content, the question of what it means to be real may shift from being a query about origins to one about consequences and relevance. The real challenge will be to ensure that amidst this sea of generated realities, we do not lose sight of what genuinely enriches our lives and communities.
Closing Thoughts
As generative AI transforms our informational landscape, blurring the boundary between real and fake, we face a crisis of truth. In a world where any voice, video or image can be synthetic, how do we define authenticity? Just as AI challenges our notions of authorship, it will force us to rethink our understanding of reality itself.
To navigate this new world, we'll need sharper critical thinking skills to separate real from fake. Schools, media, and law will have to adapt to a flood of AI content. At the same time, if we judge AI creations on their own merits - the meaning and inspiration they offer, not just their literal truth - exciting new possibilities open up.
As digital life gets more artificial, visceral real-world experiences will become more important than ever. Face-to-face talks, nature, touch: these will keep us grounded and human in an age of simulation. The "real" and the "hyperreal" will mix in complex ways that can enrich our lives.
Society will need to have an ongoing conversation to figure this out. The nature of truth, meaning, and reality in the time of AI is a deep challenge - but an opportunity too. We're in new territory, but with careful thinking, we can map a path forward. The discussion is only starting.
By the way, I'm thrilled to invite you to join the LawDroid Community, a new, exclusive platform designed for pioneers at the intersection of law and AI technology. This is more than just a community; it's a vibrant ecosystem of legal professionals, technologists, and AI enthusiasts who are reshaping the future of legal services.
Joining the LawDroid Community means being part of a select group committed to driving the future of law with AI. Whether you're a seasoned legal tech expert or a legal professional keen on exploring AI's potential, you'll find invaluable connections, insights, and opportunities here.
👉 Interested? Follow this link to apply: https://forms.gle/pdfVbdyef8P2bX189
Navigating the Challenges and Opportunities of Synthetic Voices, OpenAI Blog, March 29, 2024, https://openai.com/blog/navigating-the-challenges-and-opportunities-of-synthetic-voices
https://cloud.google.com/text-to-speech/docs/basics
https://elevenlabs.io/voice-cloning
VASA-1: Lifelike Audio-Driven Talking Faces, Generated in Real Time, Microsoft Research, April 16, 2024, https://www.microsoft.com/en-us/research/project/vasa-1/; Research paper, https://arxiv.org/pdf/2404.10667
Id.
https://www.synthesia.io/features/avatars
https://www.heygen.com/create-avatar
Meta’s powerful AI language model [LLaMA 2] has leaked online — what happens now?, The Verge, March 8, 2023, https://www.theverge.com/2023/3/8/23629362/meta-ai-language-model-llama-leak-online-misuse.
Ex-athletic director accused of framing principal with AI arrested at airport with gun, The Baltimore Banner, April 25, 2024, https://www.thebaltimorebanner.com/education/k-12-schools/eric-eiswert-ai-audio-baltimore-county-YBJNJAS6OZEE5OQVF5LFOFYN6M/
Jean Baudrillard, Simulacra and Simulation (1981) The University of Michigan Press, Ann Arbor, p. 1.
Friedrich Nietzsche, On Truth and Lie in an Extra-Moral Sense (1873).
Philip K. Dick’s 1978 speech, titled “How To Build A Universe That Doesn’t Fall Apart Two Days Later,” found in The Shifting Realities of Philip K. Dick: Selected Literary and Philosophical Writing (1995) Pantheon Books, New York.
Margery Williams, The Velveteen Rabbit (2004) Egmont Books, London.
Tell me what you think..