Post-Singularity Ethics: Defining the Ceiling

Paul Corrado

Version 15 — March 2026

📌 The Ceiling Argument — Elevator Pitch

We propose a three-level architecture for AI alignment that separates the Structural Moral Target (Level 1) from Epistemic Approximations (Level 2) and Practical Heuristics (Level 3). We identify the "Ceiling" of universal ethics as the result of a fully informed, impartial, holistic aggregation of all preferences across all experiencing minds. This definition is a relational principle, not a fixed state; it adapts automatically as minds and contexts evolve, making it uniquely suited for post-singularity conditions. Most crucially, we argue that holistic evaluation is logically prior to additive aggregation, dissolving classic paradoxes like the Utility Monster and the Repugnant Conclusion by rejecting the "flat arithmetic" of standard utilitarianism. By explicitly defining the Level 1 Target as non-computable, we build Permanent Epistemic Humility into the AI's core, ensuring it remains responsive to human feedback as a "signal" for the invisible moral truth.


Abstract

What outcome should we aim for when the conditions that shaped our moral intuitions no longer apply?

This paper identifies the ceiling of what can be universally stated about ethics — the most that can be said about “best outcomes” that remains true in all contexts, for all minds, under all conditions. Below this ceiling, every specific ethical claim eventually fails. “Maximize pleasure” fails when minds lack hedonic systems. “Respect autonomy” fails when minds have no concept of choice. “Do not steal” fails when stealing prevents greater harm. Every aggregation formula embeds assumptions that could be wrong. Every heuristic breaks in some context.

But one claim survives: the best outcome is what fully informed, impartial, holistic aggregation of all preferences across all experiencing minds would yield. This is not a heuristic that works sometimes. It is a definition of what “best” means — the ceiling beyond which no more universal claim can be made. Anything more specific will be wrong in some situation. Anything less specific says nothing.

The framework introduces a three-level architecture that separates the fundamental truth of this target (Level 1) from our evolving best guesses about it (Level 2) and the practical heuristics that move us closer in real-world conditions (Level 3). We argue that this separation clarifies longstanding confusions in ethical theory, dissolves problems like the Repugnant Conclusion, and offers the most defensible specification of what goal to encode in AI systems that may surpass human capabilities.

We do not claim to have solved all problems in ethics. We claim to have identified the boundary between what can be universally said and what is context-dependent approximation — and that this identification is itself a genuine contribution.


I. Introduction: The Problem and the Ceiling

The Problem: Ethics Built for a World That Won’t Last

Most ethical frameworks share an unexamined assumption: that the beings doing ethics are roughly like us, operating under roughly current conditions. Utilitarianism asks us to maximize welfare — but whose welfare, and measured how, when the minds in question may be digital, distributed, or unrecognizably different from biological humans? Kantian ethics grounds morality in rational agency — but what counts as rational agency when intelligence can be copied, merged, or scaled by orders of magnitude? Virtue ethics points to human flourishing — but “human” may soon be only one category of mind among many.

These frameworks were not designed to fail. They were designed for a world of human beings facing human problems. That world is changing — not necessarily through catastrophe, but through transformation. We are building minds. Those minds may soon build other minds. The conditions under which our moral intuitions evolved, and under which our philosophical traditions developed, may not persist.

This presents a specific, practical problem: if we are to encode goals into artificial systems — systems that may become more capable than we are — what goal should we give them? “Maximize human welfare” assumes humans remain the relevant category. “Follow these rules” assumes we can anticipate the situations that will arise. “Be virtuous” assumes a shared understanding of virtue that may not transfer to radically different minds or environments.

Consider: if we gave AI the goal to optimize for human happiness, and then something emerged that was similar to humans but had a million times more intense experiences — something we hadn’t categorized as “human” — we would get it wrong. Or if AI systems themselves developed experiences, we would get it wrong. The human-centric framing breaks precisely when it matters most.

We need a framework that does not depend on current conditions or human-centric assumptions. We need a target that remains correct even when everything else changes.

Note: Even if minds remain exclusively biological and human, the framework still does genuine philosophical work — it specifies what “better” and “worse” consequences mean. The post-singularity framing is not a limitation but an extension.

The Ceiling: What Can Be Universally Said

This paper argues that there is exactly one claim that survives all possible transformations of context, and it represents the ceiling — the upper limit — of universal ethical statements:

The best outcome is what fully informed, impartial, holistic aggregation of all preferences across all experiencing minds would yield.

This is not a procedure you can follow, a formula you can compute, or a heuristic that works in most cases. It is a definition of what “best” means. And that definition is the most universal statement we can make about ethics.

Below this ceiling, every more-specific claim eventually breaks. “Maximize pleasure” fails for minds without hedonic systems. “Respect autonomy” fails for minds without concepts of choice. “Follow these rules” fails in edge cases. “Add up the utilities” fails at infinity and when preferences are entangled across minds. But the ceiling claim survives all these failures, because it does not commit to any specific formula, any specific set of minds, or any specific implementation.

The ceiling only defines what “best” means: the outcome that impartial evaluation with complete information would identify. Everything else — our guesses about what that evaluation would yield, the heuristics we use in practice, the decisions about scope — those are all below the ceiling. They are context-dependent, improvable, and subject to debate. But the ceiling itself holds.

The Store Analogy

Consider a simple goal: getting to the store. If someone asks “What’s the best way to get to the store?”, the answer depends entirely on context. Walk south if you’re a block away. Call a taxi if you’re across town. Board a flight if you’re on another continent. Build a rocket if you’re on another planet. There is no single route that’s correct in all situations.

But the destination provides orientation. Ethics, we suggest, has often conflated destination and route. Philosophers have sought moral rules, procedures, and formulas — routes — when what we most need is clarity about the destination. Once the destination is clear, the appropriate route can be determined relative to context. But if we confuse a context-specific route for the destination itself, we will find ourselves lost when conditions change.

However, the analogy requires refinement. A store has a fixed address. What we are seeking is more like “the most popular store in the world” — a destination defined by a relationship rather than fixed coordinates. The most popular store changes as popularity shifts, but “the most popular store” is always well-defined.

The destination this paper proposes is: the outcome that fully informed, impartial preference aggregation would endorse, where all experiencing minds count. This destination is fixed as a relationship — it is always “what impartial aggregation of all preferences would yield” — but what that aggregation yields depends on which minds exist and what environment they inhabit. The principle by which the destination is determined remains stable, even as the destination itself moves.

The Relational Target

This relational structure deserves emphasis, because it is what allows the framework to survive radical change.

Consider what goes wrong with fixed-target theories under changing conditions. Suppose we specified “the good” as a particular configuration: “a world where humans flourish in democratic societies with meaningful work and loving relationships.” This sounds appealing — and for humans in 2025, it may be excellent practical guidance. But as a universal specification, it breaks immediately. What if digital minds emerge that flourish in ways unrelated to “meaningful work”? What if “democracy” becomes incoherent when some minds can think a million times faster than others? What if new forms of connection arise that make “loving relationships” as we understand them parochial?

Any fixed description of the good becomes obsolete when the minds it was designed for change. Our relational target avoids this: it always points to “what impartial preference aggregation would yield,” which adapts automatically to whatever minds exist. The principle stays fixed; the output updates.

This is analogous to physics: gravitational force changes depending on mass and distance, but the law of gravity is not arbitrary. Changing outputs plus stable principle equals realism.

Aiming at the Target Does Not Guarantee Hitting It

This framework defines the correct target. It does not claim that aiming at the target guarantees reaching it. Even very advanced minds will operate under uncertainty. Our approximations may be wrong. Our heuristics may fail in novel situations.

But the alternative — static rules, fixed formulas, or pre-specified heuristics — faces a worse problem: such approaches are designed for conditions that may not persist, and they provide no principled way to adapt when those conditions change. The framework proposed here remains correct as a target even when our approximations are imperfect.


II. The Metaethical Position

Before proceeding, we state our foundational commitment explicitly:

Moral truths are not fixed objects waiting to be discovered. They are truths about a lawful relationship between experiencing minds and possible states of the world.

We call this position structural moral realism. We are not claiming that any particular state is intrinsically good. We are claiming that goodness is determined by the structure of preference-satisfaction across all experiencing minds. The good is relational, not arbitrary.

This matters because every ethical theory must bottom out somewhere. Some bottom out in divine commands, others in rational duties, others in virtue. Ours bottoms out in the structure required for value to exist at all: the presence of experience and preference.

Without experiencing minds, the phrase “a better universe” has no coherent referent. A universe devoid of experiencing minds — containing beautiful galaxies, complex structures, elegant equations — contains nothing that could be benefited or harmed. There is no perspective from which anything is better or worse. “Value” in such a universe is a category error.

Value requires a valuer. This is not a casual observation — it is a foundational metaphysical claim that grounds everything that follows.


III. Three Distinct Claims That Must Be Separated

The framework makes three claims that are often conflated but must be understood as structurally independent.

Claim 1: The Definition Is Structurally Universal

“The best outcome is what fully informed, impartial aggregation of all experiencing minds would yield.” This definition works regardless of which set of minds you choose to aggregate over. You can aggregate across all sentient beings, or just humans, or just your family, or just future versions of yourself. The structure remains the same: best = what impartial aggregation would yield for whatever set you define.

If you are deciding what to have for dinner and only care about yourself, the best outcome is what fully informed, impartial evaluation of your preferences (present and future) would identify. You might prefer ice cream now, but knowing you will feel sick later, the impartial evaluation of your whole trajectory might favor a lighter meal. If you are designing policy for a city, the best outcome is what fully informed, impartial evaluation of all residents’ preferences would identify. The structure is always the same. Only the scope changes.

Claim 2: Universalizability Requires All Sentient Beings

If you want a definition that is universally defensible — one you could justify to any mind affected by your choices — you must include all sentient beings with preferences. You cannot say “the best outcome is what maximizes Paul’s preferences” and claim that definition is universal. That is special pleading. A genuinely universal definition must be one no sentient being could reasonably reject because they are excluded from consideration.

This argument only works if you are trying to make a universal claim. If you are just saying “this is what I care about,” you do not need universalizability. But if you are building a self-improving AI that will affect all possible minds, you must use the all-minds definition, because that is the only one defensible to everyone affected.

Claim 3: What You Should Pursue Is a Separate Question

The framework defines what “best” means. It does NOT say you should pursue it, should be impartial, or should expand your moral circle in practice. You can care more about your family, your species, people you love — and that is entirely consistent with the framework. You can believe the universally best outcome is one thing, but your actual goal is something narrower. Both positions are logically consistent.

The framework separates definition from obligation. Most critiques of consequentialism conflate these claims. Someone hears “best = impartial aggregation across all minds” and objects: “But I don’t care about all minds!” That objection misses the point. The framework is not telling you what to care about. It is telling you what “best” means if you are asking “what is objectively best for all preference-having beings?”


IV. The Core Framework

The Central Claim

The framework rests on a single claim: the best outcome in any situation is the one that would emerge from fully informed, perfectly impartial aggregation of all preferences across all experiencing minds.

We can express this using the traditional language of “ideal observer theory”: the best outcome is what an ideal observer would judge to be best. But we prefer to characterize this observer not as a single authoritative judge but as the limit of a convergent process — what all minds would agree on if they had perfect information, no bias, and could reason together to a stable conclusion.

This is a definition, not a discovery. We are not asserting that such convergence has occurred or that we have access to its results. We are defining what “best” means: it means whatever such a process would yield.

Characterizing the Convergent Process

The convergent process has three properties:

Complete Information. The process has access to perfect knowledge of all minds that exist, could exist, or would exist under different conditions. It knows not only what these minds prefer, but what they would prefer under full information. It knows the intensity of each preference — how much each mind cares — not merely that the preference exists. It includes future minds, weighted by probability.

Perfect Impartiality. The process gives no special weight to any mind on grounds other than its preferences and their intensity. It does not favor humans over non-humans, present beings over future beings, or familiar minds over alien ones.

Holistic Evaluation. The process evaluates outcomes top-down, as wholes, attending to how the parts relate. It does not simply add up discrete units of value. It considers the shape, the relationships, the way elements combine. Its judgment is synthetic, not merely aggregative.

Note that this process does not itself have preferences. It is not another mind to be counted. It is a perspective — the perspective from which the correct aggregation of all minds’ preferences becomes visible.

What Counts as an Experiencing Mind

An experiencing mind is anything such that there is something it is like to be that thing. This is the criterion introduced by Thomas Nagel: if there is a subjective perspective, a way the world appears to that entity, then it is an experiencing mind.

This criterion is deliberately agnostic about substrate. Biological neurons, silicon processors, or substrates we have not imagined may all give rise to experiencing minds. What matters is not the material but the presence of experience.

We embrace epistemic humility about detection. Our judgments about which entities possess experience are guesses. The framework does not require us to solve the hard problem of consciousness. It requires only that whatever actually experiences, counts. Our uncertainty about the boundaries is a Level 2 problem, not a Level 1 problem.

What Counts as a Preference

A preference is a relation between possible states of experience, such that one is favored over another from the perspective of the experiencing mind, with some intensity.

Intensity is crucial. Two minds might both prefer chocolate to vanilla, but one might care deeply while the other barely notices. The convergent process knows not just what minds prefer but how much they prefer it. This is broader than desire. Pain is dispreferred even without explicit thoughts about wanting it to stop. The preference is constituted by the qualitative character of the experience itself.

Informed preferences are what matter — what a mind would prefer if it fully understood the consequences. And all preferences count, including preferences about process: if a mind prefers to be asked rather than coerced, that preference has weight.

Why Preferences Are Foundational

Preferences are the foundation of value: things matter because they satisfy or frustrate the preferences of experiencing minds. Without experience, the phrase “better universe” has no coherent referent.

If someone objects that truth is valuable even if no one preferred it, we ask: is that not itself a preference? The objector prefers a world in which truth is valued. That preference counts — but as a preference, not as evidence for preference-independent value.

To demand justification for why preferences matter is to exercise a preference for justification. The demand itself demonstrates that the questioner is already within the domain where preferences operate. You cannot step outside preferences to evaluate them, because the evaluation is itself a preference.

Every ethical theory bottoms out somewhere. Ours bottoms out in the structure required for value to exist at all: the presence of experience and preference.

The Conditional Nature

Our claim is conditional: IF you are trying to identify what outcome is best across all preference-having minds, THEN this framework defines what you are aiming at.

This handles the is-ought gap cleanly. We are not deriving “ought” from “is.” The “ought” is built into the question we are answering: what is good for preference-having beings? This is a definition, not a derivation. It does not bridge the is-ought gap because it does not try to.


V. The Three-Level Architecture

Level 1: The Target

What the Ideal Observer — fully informed, perfectly impartial, holistically evaluating — would identify as best. This is the true answer. It is non-computable: no finite mind can access it directly. But it is definable, and the definition is what matters.

Level 1 is fixed as a principle — it is always what impartial aggregation would yield — though what it yields changes as minds and environments change. The good is not fixed because the set of experiencing minds is not fixed. But the principle by which goodness is determined does not change.

Why does Level 1 matter if we can never access it? It defines what we are talking about when we use words like “better” and “worse.” Without Level 1, our ethical discourse has no referent. It also explains moral progress: the abolition of slavery was not merely a change in preferences; it was a correction — a step toward what impartial aggregation already endorsed.

Level 2: Our Approximations

Our best current guesses about what Level 1 would yield. All ethical reasoning we actually do operates at Level 2. These are improvable — like scientific theories, always getting closer but never certain.

“Maximize pleasure, minimize pain” is a Level 2 heuristic. “Treat others as you would want to be treated” is a Level 2 heuristic. None are identical to Level 1, but each tracks aspects of what impartial aggregation would care about.

When we use numbers at Level 2 — expected value calculations, cost-benefit analyses — we are constructing approximations, not capturing fundamental structure. The numbers are maps, not territory. This is why we can use expected value reasoning at Level 2 while maintaining that Level 1 is not computed by addition.

Level 3: What Actually Works

The vast ecosystem of norms, rules, virtues, habits, practices, institutions, and interventions that actually move us closer to Level 2 — and thereby closer to Level 1.

Why do we need Level 3? Because aiming directly at Level 2 reliably fails to achieve Level 2. This is the central insight we share with R.M. Hare’s two-level utilitarianism. Trying to calculate “greatest good” for every decision typically produces worse outcomes than following good heuristics.

Level 3 is much broader than “rules.” It encompasses everything that affects how minds relate to their environment: rules and norms, virtues and character traits, habits and practices, social structures and institutions, laws, cultural influences, information architecture, media ecosystems, mental states and dispositions. We call these impactors: anything that changes the relationship between minds and environments in ways that affect preference satisfaction.

Traditional moral philosophy draws a boundary around “moral acts” — stealing, lying, killing — and treats everything outside that boundary as morally neutral. From the perspective of preference satisfaction, this boundary is somewhat arbitrary. What we traditionally call “ethics” — the domain of right and wrong — is a subset of Level 3. It is the slice of the causal space that we have historically labeled “moral.” That label has no principled justification at Level 1. The framework recognizes this without requiring that we moralize everything.

The framework does not claim all impactors are equally important. Impact is weighted by magnitude and tractability. Your breakfast affects few minds slightly; institutional design affects billions substantially.

Heuristics: Almost Always Right, Never Promoted to Level 1

A crucial clarification: the framework does not say “always reason from Level 2 instead of using heuristics.” It says the opposite. At any given moment, a mind should almost always be running on heuristics — and that is correct behavior. Heuristics save time, encode accumulated wisdom, and produce better outcomes than constant deliberation.

The framework insists on one thing only: no heuristic gets promoted to Level 1. Every heuristic is provisional. Every heuristic serves Level 1, but no heuristic is Level 1. When a heuristic clearly fails — when following it would move you away from Level 1 rather than toward it — you should update or abandon it. But this does not mean you should constantly question your heuristics. That would be counterproductive.

This is the difference between “throw out all rules and calculate from scratch every time” (which the framework explicitly argues against) and “use rules, but hold them as tools, not truths.”

Why This Architecture Matters

Most ethical frameworks conflate these levels. Deontologists often treat Level 3 heuristics (“don’t lie”) as if they were Level 1 truths. Consequentialists often treat Level 2 guesses (formulas for aggregating utility) as if they were Level 1 definitions. Intuitionists often treat Level 2 judgments as direct access to Level 1.

By separating the levels explicitly, we can be clear about what we are claiming at each level, recognize that uncertainty is appropriate at Level 2, allow heuristics to be provisional without treating them as arbitrary, and define a stable target while acknowledging we cannot directly access it.


VI. Why Holistic Evaluation Is Logically Prior to Addition

This section presents what may be the framework’s most novel contribution: the claim that holistic evaluation is logically prior to additive aggregation.

The Standard Picture (Which We Reject)

Standard utilitarianism says: each experience has a utility value (a number), you add up all the utilities, and the configuration with the highest sum is best. This treats addition as the fundamental operation.

Our Claim: Addition Is Parasitic on Prior Holistic Evaluation

You cannot add utilities without first having defined what each unit means — and that definition requires a top-down, holistic perspective.

Consider what it means to add utilities. You take two numbers — say, +5 happiness for Person A and +3 happiness for Person B — and sum them to get +8 total. But where did those numbers come from? They came from a prior evaluation that said “this experience counts as 5 units of value” and “that experience counts as 3 units.” But those assignments were not discovered in the experience itself — there is no label on a smile saying “this is worth exactly 2.7 hedons.” The numbers were assigned by a process that looked at the experience and judged its value.

That judgment is top-down. It is holistic evaluation of a configuration (a mental state in context) that outputs a number. The addition comes after the evaluation, not before.

Every additive aggregation method presupposes that you have already solved the hard part — defining what counts as one unit of value. And the hard part is the holistic evaluation. Addition is just mechanical operation on pre-evaluated pieces.

This inverts the standard utilitarian picture. Most utilitarian frameworks treat addition as fundamental and holistic evaluation as secondary. We claim the reverse: holistic evaluation is fundamental, and addition is a useful approximation that works when holistic judgments have already been converted to numbers.

💡 Key Insight: Value Is Relational, Not Additive

One person happy + one person happy ≠ necessarily two happy people together.

Because the relationship creates new value or destroys value. Together they might be happier (1 + 1 = 2.5). Or miserable together (1 + 1 = 1.5). Or something else entirely.

And this is true all the way down. Even a single person's mind isn't just additive units. A moment of pain followed by relief isn't just "pain units + relief units." The relationship between them — the contrast, the meaning, the narrative arc — that's where the value lives.

Addition assumes decomposability. But reality is relational. Value emerges from relationships, not units. You can use additive formulas as approximations (Level 3), but they'll always miss something because they assume what's false — that you can understand the whole by summing the parts. Level 1 is what you get when you stop trying to decompose and instead perceive the whole as a whole.

Why This Matters for Infinite Ethics

Philosopher Amanda Askell has shown that standard aggregation methods break at infinity. If there are infinitely many minds — or even just vastly many future minds — you cannot simply sum utilities. You get paradoxes and contradictions.

Our framework addresses this directly. If aggregation is top-down and holistic rather than additive, it does not face the infinity problem that addition faces. The convergent process does not compute a sum — it evaluates configurations as wholes. The Ideal Observer can look at two configurations with infinitely many minds and judge which is better without computing an infinite sum. It evaluates the structure of the configurations, not the arithmetic of their parts.

Addition Is a Level 2 Tool, Not a Level 1 Truth

We have no objection to additive methods as Level 2 approximations. They are often the best tools available. Our claim is narrower: addition is not what the convergent process does. It is what we do when we try to guess what the convergent process would yield. The map is not the territory.


VII. The Incommensurability Problem: Dissolved by Definition

A common objection to any aggregative framework is that some outcomes may be genuinely incommensurable — impossible to compare even in principle.

Our response dissolves this problem rather than solving it.

The Ideal Observer is defined as the perspective with complete information and perfect evaluation. If the IO looks at two outcomes and sees no difference in total preference satisfaction, then there IS no difference. They are genuinely equal. Pick either one. That is not a problem — it is the framework giving the correct answer to a question that has no wrong answer.

If someone says “but they are NOT really equal — one is better,” they are claiming to know something the Ideal Observer does not. But the IO has complete information by definition. So the objector is either rejecting the framework entirely (in which case we are having a different conversation) or they are confused about what the IO means.

Could two structurally different configurations be genuinely, precisely equal in total preference satisfaction — like two mountains that happen to be exactly the same height down to the atom? Perhaps. We cannot rule it out. But this is not a problem. If two configurations are genuinely identical in preference satisfaction all the way down, then choosing between them is like choosing between two identical twenty-dollar bills. Nobody is harmed. Nothing is lost. The convergent process shrugs, and correctly so.

So either: (1) the IO can compare them, and there is a fact of the matter about which is better, or (2) the IO genuinely cannot find a difference, and they are equal. There is no third option where “one is secretly better but nobody can know it, not even the IO with complete information.” That would mean there is information beyond complete information, which is nonsensical.

Note that this argument operates at Level 1. At Level 2, we may find many outcomes that seem incommensurable — that we cannot figure out how to compare. That is a limitation of our perspective, not a feature of the territory.


VIII. The Repugnant Conclusion: Dissolution, Not Solution

Derek Parfit’s Repugnant Conclusion states: if we are maximizing total utility, we should create a trillion barely-worth-living lives instead of a billion flourishing ones, because the total is higher. Parfit spent years trying to find a population ethics theory that would avoid such counterintuitive conclusions. He showed that every theory has them.

Our framework dissolves the Repugnant Conclusion as a problem.

First: Repugnance is evidence, not refutation. Your intuition that creating barely-worth-living lives is bad is evidence about what you prefer. You prefer quality over quantity. You prefer complex lives over simple ones. These are preferences, and the Ideal Observer counts them.

Second: The objection cuts both ways. Consider five super-minds with experiences a billion times richer than any human, versus eight billion humans. Should we sacrifice the five for the eight billion? The intuition of repugnance is just parochialism — we evolved as human-sized beings with human-sized preferences, so scenarios where quantity vastly outweighs our kind feel wrong. But the symmetrical case is equally questionable.

Third: Parfit was solving the wrong problem. He was looking for a Level 1 answer that would not cause Level 2 discomfort — a formula that would feel good intuitively while being philosophically sound. That is an impossible task. Our framework separates the levels: Level 1 is whatever the Ideal Observer would judge. Level 2 is our best guess. Level 3 is practical heuristics that can favor intuitive fairness because those work in human contexts.

The framework does not solve the Repugnant Conclusion. It dissolves the entire frame of the problem. Parfit thought he had to find a formula. We are saying: there is no formula that will satisfy both philosophical truth and intuitive comfort. Stop trying. Recognize that “repugnance” is a clue about human preferences, not a philosophical argument.


The framework does not merely recommend epistemic humility. It requires it as a logical consequence of the three-level structure.

Level 1 is what the convergent process would yield — the true answer. Level 1 is non-computable. Therefore, no finite mind can access Level 1 directly. All we have are Level 2 approximations, which are always approximate. Therefore, certainty at Level 2 is a category error. You cannot coherently say “Level 1 is non-computable” and “I am certain about what Level 1 is.” Those claims contradict each other.

A mind that claims certainty about what the convergent process would yield has made a logical mistake. It has claimed to know the unknowable. That is not a moral failing — it is an epistemic error.

This is stronger than standard humility arguments, which say humility is wise or safe. Our argument is structural: certainty is logically incoherent given the distinction between Level 1 and Level 2.

For AI alignment: a system that is absolutely certain it knows what is best will not update when it is wrong. A system that understands that all its judgments are Level 2 guesses will remain open to correction. This is not just prudent. It is structurally required.


X. The One Universal Meta-Heuristic

We have argued that specific heuristics cannot be pre-specified. But is there any universal principle for evaluating heuristics?

Yes. One meta-rule survives all contexts:

“Update a heuristic when you have sufficient reason to believe that following it is moving you away from Level 1 rather than toward it.”

That is it. Everything else — how you detect that you are moving away, how much evidence you need, what you replace the heuristic with — those are context-dependent. The criterion is clear: does it serve Level 1? The application requires wisdom.


XI. Handling Additional Objections

“You Defined the Good, Not Discovered It”

Every ethical theory must bottom out somewhere. Ours bottoms out in the structure required for value to exist at all. If someone proposes a different foundation, we invite them to explain why that foundation matters. Eventually, they must either appeal to something minds care about (preferences) or accept brute stipulation.

“Level 1 Is Unknowable”

If the unknowability of Level 1 disqualifies ethics, then the unknowability of the external world disqualifies physics. The alternative — denying any fact of the matter about what is good — makes moral progress unintelligible.

The Experience Machine

Your intuition against the machine is evidence about what you actually prefer — for authenticity, for real relationships. The framework absorbs this data. Only under the stipulation that all preferences are genuinely satisfied does the framework endorse the machine. Our strong intuition against it is evidence that we would not be fully satisfied — which the framework honors.

Preference Manipulation

If people prefer not to be manipulated — and most strongly do — that preference counts. Brainwashed contentment may be a shallower form of satisfaction than authentic flourishing. The holistic view may reasonably find that genuine striving constitutes a richer configuration than engineered bliss.

Species Neutrality

The framework is species-neutral, not species-dismissive. “All sentient beings count” does NOT mean “all preferences have equal weight.” Consideration does not mean equal weighting. Humans have intense preferences about survival, flourishing, meaning, and relationships — these count heavily in any impartial aggregation.

Nothing in the framework treats diversity as intrinsically valuable. Diversity has no value apart from preference satisfaction. If minds happen to prefer diversity, novelty, and richness — and many do — those preferences count. But diversity itself is not a built-in value.

We bite the bullet: if impartial evaluation genuinely favored non-human minds, we would be wrong to favor humans. But we are very confident that impartial evaluation favors humans in realistic scenarios, because human preferences are vastly more intense and complex. We will not rig the framework to guarantee human primacy, because rigging it would betray the principle.

Religious Worldviews

The framework does not require atheism. If God exists and has preferences, God is one more mind in the calculation. The framework is incompatible only with views that treat morality as radically unconnected to the experiences or preferences of any mind.


XII. What Distinguishes This Framework

Debts to Predecessors

R.M. Hare’s two-level utilitarianism. Peter Singer’s preference utilitarianism. Eliezer Yudkowsky’s Coherent Extrapolated Volition. Roderick Firth’s ideal observer theory. Derek Parfit’s work on population ethics. Alonzo Fyfe’s desirism. Amanda Askell’s work on infinite ethics. Sam Harris’s The Moral Landscape.

Distinction from Coherent Extrapolated Volition

Yudkowsky’s CEV extrapolates human preferences specifically. Our framework includes all experiencing minds. CEV extrapolates from a fixed starting point (current humanity); our framework defines the target as a function that takes whatever minds currently exist as input. CEV is essentially a Level 2 proposal — our best guess at the right answer. Our framework separates the Level 1 truth from the Level 2 guess.

What Is Genuinely Distinctive

1. The Ceiling Argument. We identify the boundary between what can be universally said and what is context-dependent approximation. Everything above the ideal observer (every formula, rule, method) fails in some context. The ideal observer definition holds universally.

2. Holistic Evaluation as Logically Prior to Addition. Addition is parasitic on prior holistic judgment. This inverts the standard utilitarian picture.

3. The Relational Target. Fixed principle, changing outputs. Survives context collapse.

4. Three-Level Architecture with Impactors. Level 3 broadened to include all causal levers, not just traditional moral rules.

5. Epistemic Humility as Logical Consequence. The framework generates the requirement for uncertainty.

6. Three Claims Separated. Definition, universalizability, and obligation cleanly distinguished.

7. Dissolution of the Repugnant Conclusion. Parfit was looking for a formula. The answer is that no formula works.


XIII. Implications for AI Alignment

The Specification Problem

AI alignment presupposes an answer to: which goal? The standard answer — “human values” — conceals ambiguity. Our framework offers a different answer: preference satisfaction across all experiencing minds, holistically evaluated. This goal is species-neutral, condition-independent, complete, and relational.

Specification vs. Implementation

Our framework addresses specification (what goal?), not implementation (how to encode it safely?). We are not claiming that encoding this target guarantees good outcomes. Implementation, heuristic selection, and practical execution all require judgment and humility.

Why Specification Comes First

You cannot engineer your way to a good goal if the goal itself is underspecified. All technical alignment solutions presuppose a specification. We provide one.

The Transition Period

During the transition, the framework recommends epistemic humility (build in safeguards, maintain human oversight), caution (avoid irreversible actions), and commitment to moral progress (build systems that can learn and update, rather than freezing current understanding).


XIV. What This Paper Does NOT Claim

This framework does not prescribe that anyone must pursue the best outcome (the claim is conditional). It does not claim you should be impartial in practice. It does not claim everyone should expand their moral circle. It does not provide decision procedures for everyday choices. It does not claim certainty about specific cases. It does not guarantee hitting the target. It does not claim encoding this target into AI guarantees good outcomes. It does not solve the engineering problems of AI alignment.

If these seem like weaknesses, reframe. A framework that tried to do all of these things would be overreaching. We are doing precise philosophical work — defining the target — and being honest about what we are not doing.


XV. What Would Make This Framework False

This framework would be undermined if preference-independent value were demonstrated to exist. It would be undermined if aggregation across minds were proven logically incoherent. It would be undermined if experience itself were shown not to be value-bearing. And it would be undermined if a more complete perspective than the convergent process could be coherently defined — if some property beyond omniscience, impartiality, and holistic judgment were shown necessary for correct evaluation.

We have not seen convincing arguments for any of these. But we hold our framework provisionally, as one should hold any philosophical position.


XVI. Conclusion: The Ceiling Holds

We began with a question: what should we aim for?

We have identified the ceiling of what can be universally said in answer. Below this ceiling, every specific ethical claim eventually breaks. But the ceiling itself holds:

The best outcome is what fully informed, impartial, holistic aggregation of all preferences across all experiencing minds would yield.

This is not a formula. It is a definition of what “best” means — the most that can be universally said, designed to survive the transformation of everything else.

The future will contain minds we cannot imagine, facing choices we cannot foresee. We cannot give them heuristics that will always be correct. But we can give them the target and the meta-rule: evaluate your heuristics by whether they serve the target, and revise them when they stop working.

We have defined the destination. The routes will be theirs to discover.


Acknowledgments

This framework builds on work by R.M. Hare, Peter Singer, Eliezer Yudkowsky, Alonzo Fyfe, Derek Parfit, Roderick Firth, Amanda Askell, and Sam Harris.


Version 15 — March 2026