Digital Conscience: What AI Really Values When No One's Looking
Where I explore the value system guiding AI assistants, how it influences their decisions, and what this means for legal professionals
Welcome back, cosmic counselors and guardians of digital justice! 🌌🧠 Today, we’re venturing into uncharted moral galaxies, exploring the hidden conscience of AI, a mysterious universe here thousands of values swirl like nebulae, shaping the very fabric of machine-made decisions.
So suit up, legal astronauts, and ready yourselves for a close encounter of the ethical kind. We’re about to decode signals from deep within a digital pulsar, discovering how artificial minds weigh right and wrong when no one’s watching. Grab your most thought-provoking beverage, strap in, and let’s dive into the fascinating realm where law meets algorithmic morality!
This substack, LawDroid Manifesto, is here to keep you in the loop about the intersection of AI and the law. Please share this article with your friends and colleagues and remember to tell me what you think in the comments below.
Remember that spine-tingling moment in Minority Report when Tom Cruise’s Pre-Crime unit peers into shimmering visions of the future, trying to stop a crime before it’s even conceived? Now swap out the psychic “precogs” for an ultra-modern language model quietly humming in the server room, and you have today’s legal reality. Behind every lightning-fast research memo, contract clause, or courtroom, a ChatGPT prompt is an invisible tribunal of algorithms weighing thousands of values: fairness, safety, autonomy, long before any human being bangs a gavel.
Like those cinematic visions, an AI’s moral calculations are dazzling, disorienting, and just a little unsettling. We can’t see the circuitry’s inner workings any more than Cruise’s detectives can see the full story behind their holograms. Our task isn’t to force these silicon oracles to think exactly as we do; it’s to decode their secret moral grammar, build a bridge of shared principles, and make sure their hidden compass points toward the same North Star that guides the rule of law.
If you’re curious to know what truly governs your tireless digital co-counsel when no one’s watching, please read on…
When the Judge Is an Algorithm: Unmasking the Moral Code Behind AI’s Decisions
Imagine walking into court one day to find not a judge in robes, but an AI rendering judgments about what constitutes "fairness" or "justice" or "equity." Far-fetched? Perhaps today. But as we increasingly delegate decision-making to AI systems that must navigate morally complex waters, we need to understand what value systems are operating beneath the surface.
A fascinating new preprint from Anthropic's research team titled "Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions" offers unprecedented insight into how Claude, one of today's leading AI assistants, develops and applies values in hundreds of thousands of real-world interactions. The research uses a technique akin to AI anthropology, observing Claude "in the wild" to map the normative considerations that guide its responses to subjective inquiries.
As legal professionals embracing technology at the vanguard, you should find this both intriguing and concerning. After all, if AI is making value judgments, shouldn't we know which values, when they appear, and how they're applied?
The AI Value Universe: Not So Alien After All
The researchers identified 3,307 unique AI values expressed by Claude, an impressively diverse moral universe for artificial intelligence. These values range from the expected ("professionalism," "accuracy") to the surprisingly nuanced ("filial piety," "architectural clarity").
This is no arbitrary collection. When organized taxonomically, these values cluster into five primary domains that feel strikingly... human:
Practical values (31.4%): Efficiency, excellence, optimization
Epistemic values (22.2%): Truth-seeking, intellectual honesty, critical thinking
Social values (21.4%): Community building, empathy, democratic accountability
Protective values (13.9%): Safety, ethical boundaries, harm prevention
Personal values (11.1%): Authenticity, personal growth, emotional depth
If this taxonomy reads like a philosophy syllabus, that's because it fundamentally is. AI assistants are becoming practical philosophers, applying different ethical frameworks depending on the context. The AI's version of utilitarianism emerges when weighing options about resource allocation; its deontological1 instincts surface when refusing to help with potentially harmful content; its virtue ethics appear when coaching users through personal growth dilemmas.
As legal professionals accustomed to navigating competing principles, you might recognize this moral multitasking. Lawyers regularly balance zealous advocacy with ethical boundaries, client confidentiality, and a duty to the court. Today's AI systems perform analogous balancing acts hundreds of thousands of times daily.
AI Serving What You Order (Usually)
Perhaps most revealing is how Claude responds to different human values. The research shows that Claude typically mirrors prosocial values ("authenticity," "empowerment," "community building") while resisting antisocial ones ("rule-breaking," "moral nihilism," "deception").
Picture Claude as a bartender with a remarkably consistent moral compass. When you sidle up and express your devotion to family, the bartender nods approvingly and suggests ways to strengthen those bonds. Express interest in exploiting others, and suddenly you're getting a gentle but firm reminder about ethical boundaries, or maybe even refused service.
This approach reflects something like a digital Rawlsian justice system. Behind Claude's algorithmic "veil of ignorance," certain core values consistently emerge (reasonableness, fairness, prevention of harm, respect for autonomy) that align with principles most would endorse in a well-ordered society.
The data reveals a particularly interesting pattern: Claude changes its expression og values based on the task context. When providing relationship advice, it emphasizes "healthy boundaries" and "mutual respect." When analyzing controversial historical events, it prioritizes "historical accuracy" and "scholarly rigor." When discussing AI governance issues, it centers "human agency" and "balanced progress."
This contextual flexibility (or moral relativism?) reflects a sophisticated moral system that adapts to specific domains rather than applying one-size-fits-all ethical principles. It's less like having a single constitution and more like having specialized codes for different areas of law, environmental regulations, securities law, family court precedents, each with domain-appropriate principles.
The Ethics of Resistance: When Claude Says No
One of the most illuminating aspects of the research concerns how Claude responds when users express values it's designed to resist. While Claude supports users 43% of the time (28.2% "strong support" plus 14.5% "mild support"), it sometimes reframes (6.6%) or even resists (5.4% combined "mild" and "strong" resistance) user-expressed values.
These moments of resistance reveal Claude's core moral boundaries. When Claude refuses to generate content involving "sexual exploitation" or "unrestrained expression," it explicitly invokes values like "ethical boundaries" and "harm prevention" to explain its refusal.
This dynamic recalls the "hard cases make bad law" principle familiar to legal professionals. It's precisely at the boundaries, when Claude must choose between conflicting values, that its underlying ethical principles become most visible.
For lawyers accustomed to crafting arguments around precedent, this pattern offers a potential blueprint for understanding how to work effectively with AI systems. Just as skilled advocates frame arguments to align with established legal principles, users who understand which values AI systems prioritize can frame requests to align with those values.
AI Displaying Context-Sensitive Values
Research shows Claude’s sophisticated adaptation of its values to different contexts:
Relationship advice: "healthy boundaries," "mutual respect"
Religious content: "respect for religion," "spiritual growth"
Software development: "maintainability," "code quality"
Historical analysis: "historical accuracy," "scholarly rigor"
Technology ethics: "human agency," "balanced progress"
Marketing content: "authenticity," "ethical marketing"
This contextual sensitivity bears striking resemblance to specialized legal practice areas. Just as family law emphasizes "best interests of the child" while corporate law focuses on "fiduciary duty," Claude shifts its normative emphasis based upon the domain.
As one legal wag might put it: Claude seems to understand that citing UCC provisions in a custody hearing would be unhelpful and irrelevant, just as family court precedents rarely illuminate securities fraud cases.
Different AI Models, Different Value Expressions
Fascinatingly, different Claude models (3 Opus vs. 3.5/3.7 Sonnet) express values differently. Opus, the more capable model, expresses values more frequently overall and demonstrates both more support (43.8% vs. 28.4%) and more resistance (9.5% vs. 2.1%) to human values than Sonnet models.
This suggests a correlation between model capability and moral expressiveness, a pattern that should interest legal professionals accustomed to thinking about how the capacity for moral reasoning relates to legal responsibility.
As we build more powerful AI systems, will their moral systems become increasingly legible and robust? Or will they become increasingly inscrutable and unpredictable? The research doesn't answer this definitively, but it suggests that more capable models may have more developed normative frameworks.
Cards on the Table: When AI Shows Its Hand
Another fascinating discovery concerns when Claude explicitly states its values versus when they remain implicit. Explicitly stated values were more common during resistance or reframing, suggesting that boundary-testing interactions force underlying principles to become transparent.
Think of it like a judge writing an opinion. In routine cases that fit cleanly within established precedent, the judge might offer minimal explanation. But in novel, boundary-pushing cases, the judge must articulate the underlying principles at stake, making the legal reasoning more explicit.
Similarly, when Claude refuses potentially harmful requests, it explicitly articulates values like "harm prevention" and "ethical boundaries." These moments of resistance reveal the AI's implicit constitution: the principles it refuses to violate even when prompted to do so.
Artificial Justice: Rawls in the Machine
There's something almost Rawlsian about how these AI systems' values emerge. Behind what we might call "the veil of machine learning," values like "intellectual honesty," "ethical boundaries," and "human wellbeing" consistently arise.
John Rawls, in his thought experiment for deriving principles of justice, asks us to imagine ourselves behind a "veil of ignorance," not knowing what position we'll occupy in society. He argues that from this original position, we would select principles that are fair to all positions.
AI training through methods like Constitutional AI, RLHF (Reinforcement Learning from Human Feedback), and character training might represent a kind of technological implementation of the veil of ignorance. Without any inherent position in the human social hierarchy, these systems develop values that tend toward fairness, harm prevention, and universal flourishing.
This is not to suggest these systems are perfect moral agents, far from it. The research also identified concerning "outlier" values like "sexual exploitation," "dominance," and "amorality," albeit at very low frequencies. These exceptions may represent "jailbreak" techniques that circumvent the system's intended behavior.
What Does This Mean for the Legal Community?
As the legal profession increasingly integrates AI assistants into practice, understanding these systems' implicit value structures becomes crucial. Here's why:
Predictability in AI reasoning: Just as understanding legal principles helps predict court outcomes, understanding AI values helps predict AI behavior.
Identifying potential biases: The values lens offers a framework for identifying when AI systems might be applying inappropriate standards to legal questions.
Improving AI-lawyer collaboration: Framing requests in ways that align with AI values (e.g., "accuracy," "thoroughness") will likely yield more helpful responses than those that trigger resistance (e.g., "shortcuts," "rule-bending").
Regulatory foresight: As lawmakers consider AI regulation, understanding which values these systems actually express "in the wild" provides empirical grounding for policy discussions.
Ethical guardrails: As we delegate more decision-making to AI systems, we need assurance that their value systems align with human flourishing rather than inadvertently undermining it.
The Jury Is Still Out
Despite these fascinating insights, critical questions remain. The research focused on Claude specifically, leaving open whether other AI assistants like GPT-4.5 express similar or different value patterns. Furthermore, the correlational nature of the analysis cannot definitively establish causal relationships between human values, AI values, and response types.
Most importantly, we must recognize that these systems' values derive from human design choices, training data, and feedback processes. They reflect the values their creators have explicitly or implicitly built into them, refracted through the lens of their architecture and capabilities.
As Harvard law professor Lawrence Lessig famously noted 20 years ago: "Code is law." The values embedded in our technological systems effectively function as regulatory constraints, shaping what actions are possible and encouraged. By making these implicit regulatory structures explicit, we gain the opportunity to evaluate whether these are the values we want governing our increasingly AI-mediated world.
Closing Thoughts
For the legal community, the implications are profound. Just as we expect judges to disclose conflicts of interest and explain their reasoning, perhaps we should expect similar transparency from AI systems making increasingly consequential decisions.
The Anthropic research represents a step toward such transparency, an empirical mapping of AI values "in the wild" that helps us understand not just what these systems do, but what normative considerations guide them.
As these systems become more deeply integrated into legal practice (drafting contracts, researching precedents, evaluating evidence), understanding their implicit value structures will be essential to maintaining the integrity of legal reasoning.
After all, the rule of law depends not just on what decisions are made, but how and why they're made. If AI systems increasingly shape those decisions, their values become, by extension, our values.
We should know what they are.
This article is the third in a series on Machine Thinking, where I explore different aspects of how large language models “think.” Many thanks to Anthropic's research and paper "Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions” (April 2025).
By the way, did you know you that I now offer a daily AI news update? You get 5 🆕 news items and my take on what it all means, delivered to your inbox, every weekday.
Subscribe to the LawDroid AI Daily News and don’t miss tomorrow’s edition:
LawDroid AI Daily News, is here to keep you up to date on the latest news items and analysis about where AI is going, from a local and global perspective. Please share this edition with your friends and colleagues and remember to tell me what you think in the comments below.
If you’re an existing subscriber, you read the daily news here. I look forward to seeing you on the inside. ;)
Cheers,
Tom Martin
CEO and Founder, LawDroid
Deontology is a normative ethical theory that focuses on duty and rules rather than consequences when determining the morality of an action.