The Ends Subdivides The Means: Quantitatively Testing For Instrumental Convergence To Objectify Ethicality
(Conceptual Fitness) = (+/- Desirability integer of hypothesis position) ÷ (Product of Mutually Exclusive Ends)
This equation suggests that to judge the fitness of an idea, concept, principle, or belief, one need only consider the following:
- The Boolean orientation of “desirability” is a subjective position assigned to the hypothesis being considered. If the outcome is desired, then this value is a “+1”. If the hypothesis is regarding an undesired outcome or decision, then it is a “-1”.
- Exclusive ends are the objectively likely results of the hypothesis. However, as the purpose of this exercise is measurable results with a minimum of interpretation, a spectrum of scoring categories have been devised. For each applicable category, one identifies the quantity of mutually exclusive objective outcomes.
NOTE: This number is based only on quantifiable and antithetical outcomes relative to the hypothesis. However, there are no de facto score assignments.
- Fiscal Impact: This is the financial impact of the hypothesis in question expected to be a net gain or a net loss? Consideration includes initial investment, operational costs, economic growth, or potential for inflation. For those familiar with voter guides, one might be familiar with the estimated financial impact. If there are two qualified and notable opposing financial stances (one net gain, one net loss), then this would be a 2 score. If they are simply varying degrees of gain (or loss) but not consequently so, then it would be a score of 1.
- Environmental Impact: Evaluate the relative environmental consequences like resource use, pollution, and effects on biodiversity. This is scored relative to alternatives based on the same goal metric.
- Self-Reliance: Does the hypothesis promote or hinder independence? This can be considered at the relevant scale, be it individual, local, or federal. For example, might a treaty with another nation result in greater autonomy and security, or increase dependence on the actions of the other nation? Will an aid measure enable individuals to become net contributors, or is it equally likely to result in permanent codependency? It is the quantity of divergent possible outcomes to score here.
- Governance: What are the mechanisms for implementation or enforcement? If there is a self-evident process, then this will be a 1. If there are many levels of administration or selective active enforcement is needed, then each of the unique mechanisms required adds to this score.
- Aesthetic Impact: Although this category is arguably subjective, a quantitative measurement is possible. For example, in the case of an architectural or environmental project, the volume of space required or the area from which it is visible can be determined. When compared to alternatives, if the aesthetic volume impact is less, then it is a 1. An inferior metric or a case where aesthetic impact is of notable debate, then 2.
If the singular purpose is aesthetic without an objective metric, an opinion poll of the immediate community can generate a metric where a favorable majority would be a 1. Where an acceptable majority or quorum cannot be reached, it is the sum of antithetical positions.
- Precedents: If there only exists applicable precedents where the hypothesis is supported, then the score is 1. If there are precedents that show an antithetical result, or if there are a plurality of results attributable by their merits to the applicable precedent, then it is the sum of divergent results. This requires considering historical outcomes including how previous implementations affect law, societal structure, and public opinion.
- Perform the calculation:
- To get the Product of Exclusive Ends scores, multiply the value determined for each of the categories together. If all of the categories only has a 1 or is entirely inapplicable, then the product is a 1. If four of the categories received a 2, then the product is 16.
- Results approaching 1 or -1 would suggest that the hypothesis in question is objectively sound. By being objectively qualified and relatively weighted, this suggests the hypothesis as stated is ethically accurate.
- Results close to zero would indicate that the hypothesis is unqualified. That it suffers from objective deficiencies suggests the likelihood of instrumental convergence or perverse incentive making support of the hypothesis unethical.
Discussion:
In other analyses, I have speculated on the root of ethics and what leads to dysfunctional policies and practices. Additionally, the mischaracterization of happiness as objectively tangible is a significant motivation yet a poor metric for making sustainable decisions. Although these analyses seek to reveal the reasons behind socially destabilizing practices, they fail to provide a useful tool to make an objective judgment on these seemingly subjective ethical issues.
I intend to correct that deficiency here.
The equation at the head of this document reflects a theory of objective ethical analysis. It is my observation that ideas and practices that are not self-limiting to a singular outcome having a plurality of mutually exclusive ends are subject to instrumental convergences often resulting in perverse incentives. In contrast, principles that seem to be successfully repeatable or historically ethical are identifiably straightforward.
In other words, if the hypothesis or “means” can be shown to result in a variety of divergent “ends”, it is unethical.
But if the “means” demonstrably results in only one “end”, it is likely an ethical hypothesis.
Where this set of observations seems relatively straightforward, what ultimately detracts from what would be considered common sense are subjective takes on the matter being considered. Although morals are often considered subjective interpretations of objective actions, in reality, it is objective metrics alone that determine whether or not something is ethical. This is historically demonstrable that the periods of greatest peace and progress show the strongest alignment to the Exclusive Ends categories. In contrast, the greatest atrocities and least sustainable conditions are caused when policies or social standards are least aligned to those categories.
Conditional subjectivity truly is the root of all evil.
The Problem of Instrumental Convergence
Throughout history, social and political commentary has been used to continuously challenge the practical and ethical fitness of various ideas. Whether it be moral principles, community standards, or laws and ordinances, there is often the subjective opinion that certain ideas are objectively wrong. For such ethical trespasses, many ask for what reason one would support a given idea, movement, individual, principle, etc. Why this rationale is often difficult to ascertain is because it is never a singular one.
For example, consider the Nazi movement of the late 1930s in Germany whose practices during that time are considered morally reprehensible. What would motivate any individual let alone millions of people to comply with or emulate such a group? The only consistency here seems to be the breadth of positions including but not limited to fear of the movement, conflict fatigue from WWI, animosity between countries, desire for power, financial opportunities, anti-religious ideologies, racial and cultural scapegoating, etc. Hitler’s influence didn’t grow for any one reason alone, but because people had diverse reasons to either appease or support his group ultimately enabling it.
Instrumental convergence is a principle that has generated interest recently in the study of artificial intelligence. In summary, it is the idea of multiple goals utilizing a single sub-goal. This may seem innocent enough or even efficient. However, it is these end goals that are used to justify the validity of the common sub-goal. In other words, this is the definition of the ethically contemptuous concept that “the ends justify the means”.
Currently there is a significant concern for unbounded AI, if it generates logical sub-goals that are ultimately harmful or unethical to achieve a specific end.
The irony is that, while we are presently concerned with whether or not an artificial intelligence may utilize this logical process with detrimental effects, much less attention seems to be spent on monitoring instrumental convergence in human behavior.
Returning to the 1930s Germany, despite the variety of seemingly antithetical goals being pursued by people throughout Europe (power, revenge, money, avoidance of conflict, social engineering, divergent beliefs), the common instrumental goal was appeasement of the Nazi party. Even if the majority of Europe and their allies objected to the actions of the German government, those actively resisting the sub-goal were a minority. Acquiescence therefore became a contributing factor to the subgoal of German aggression until it escalated into a hot war.
In contrast, there are ideas or movements for which only one goal or set of complimentary goals can be determined which is much less likely to result in instrumental subgoals or perverse incentives.
An example of this has been the pursuit of nuclear power which continues to have the unifying purpose of providing reliable and efficient power generation. When considering all objective metrics on currently available technologies (cost, reliability, security, space, released emissions, waste management, technological advancement, material recycling, longevity, etc.), nuclear power is far and beyond the superior actionable system. Even the byproducts of nuclear fission are essential for powering space probes (Pu-238) and for supporting potential future technologies like nuclear fusion reactors (Tritium).
One might argue that the original interest in nuclear power was weapons related. In reality, the first practical use of radiological research was medical. Marie Curie herself was instrumental in providing x-ray imaging devices to care for soldiers wounded in WWI. When taking nuclear research out of the theoretical, this was a unifying and singular goal at the time. And when it became mathematically evident that radioisotopes could be weaponized, there was certainly a change of focus to create such devices. Considering it an inevitability, there was a unifying and singular goal to be the first to create a viable arsenal to ensure military success and therefore survival.
Each major phase of nuclear research has been largely devoid of instrumental convergence. Whether for medical, defense, or energy purposes, the goal has been clear with little if any other possible contrary benefit from that process. Even if military use is a concern, thorium is a more plentiful nuclear fuel than uranium and does not produce weaponizable isotopes. But, despite the clarity of purpose and nobility of cause, this has not insulated nuclear energy from criticism and vilification.
Where the Nazi movement seemed to gain momentum until it became an existential threat to the rest of the world, nuclear power was deemed to be an existential threat when it promised to provide cheap, reliable, and environmentally conscious energy to all nations.
Although the ethically abhorrent actions of the Nazi party and socioeconomically enabling nuclear energy are seemingly incompatible premises. However, both reflect opposite sides of the same instrumentally convergent coin. Where Hitler gained power by way of instrumental convergence, nuclear power is derided specifically by the instrumental convergence of its opponents. Those with an interest in limiting the autonomy of governments (foreign and domestic), maintaining control over industrial and technological processes, reducing populations, extending profits from inferior and wasteful technology, etc., still see nuclear power as a threat to those goals. Although these goals are often mutually exclusive and clandestine, they instrumentally share the elimination of nuclear energy as a whole.
A Method For Quantitatively Testing For Instrumental Convergence To Objectify Ethicality
The hypothesis proposed here is that one can perform an objective analysis of a defined hypothesis on any subject and mathematically determine a measure of ethical adherence. On the historic examples that an ethical premise is the one least likely to encourage perverse incentives, and perverse incentives are least likely when instrumental convergence is minimalized, then we should be able to make such an objective evaluation.
The scoring categories are chosen specifically to be comprehensive and scalable. Whether discussing a planet-wide consideration or the function of a single cell, these broad but definable categories can be applied at their relevant scale.
Note that the category scores are not simply a sum of concerns, but the sum of demonstrable properties for which there are contradictory results of objectively comparable significance.
For example, guinea pigs can bite. However, the rate of known injuries caused by guinea pigs is extremely low, as is the case with many pets.
In contrast, dogs represent such a significant number of annual injuries as to be a statistically significant injury risk when compared to all other privately owned “pet” animals (roughly 0.0126 injuries per dog per year that require medical attention).
The data available suggest that even pet tigers have a significantly lower chance of causing injuries requiring medical treatment.
Therefore, when considering the safety of a given pet (security being technically a governance or self-reliance consideration depending on the scale at which it is being applied), it is relative to the objective and measurable threat of viable alternatives.
Therefore, where all alternatives suffer the same concern at a similar proportion, no one version can be scored better or worse. Only if there is a viable alternative that is objectively superior can there be a point awarded.
The necessity of objectively calculated ethicality
Discussions on ethics are largely based on the assumption that the topic is dynamic or subjective. However, it is only when the outcomes within those moral parameters are objectively superior that an ethic is validated.
For instance, discouraging alcoholism is objectively ethical by improving the overall performance of a population and minimizing negative social and health results.
However, discouraging the adoption of nuclear energy has necessitated the adoption of more polluting options that have increased conflict and lowered energy security. By every objective measure, that stance is unethical.
Ethicality therefore must correlate with outcomes or there ceases to be a purpose to the ethic. In that light, it is only the congruencies to history that are of interest. Outliers need to show quantitative sustainable outcomes to be considered superior to the established result trends. This means that an objective solution framework by which ethical outcomes are more likely is a filtering tool that can be realized.
For this model to function, the quality and sincerity of historical and statistical data is always going to be a concern. However, LLM systems like Grok, ClaudeAI, and ChatGPT are largely trained with broad access to the internet and all the information available therein. The consumer market for these tools has an interest in their training and access to have little restriction risking exposure for being biased or hamstrung. With such a vast and diverse collection of source data, this should maximize the likelihood of sufficient relevant information being available for consideration by an AI.
Although the performance of computations without bias is considered the most significant threat of AI leveraging instrumental convergence, it is that very separation from genuine awareness and human experience that makes All humans, no matter how well-intentioned, will ultimately make some choices based on perverse incentive. And the expectation that humans can be responsible for supplying all ethical safeguards to an AI without any lapse of integrity is foolish.
Although some universal unconditional ethics should be incorporated, the irony is that an AI or LLM is uniquely capable of providing the most informed ethical response based on merits alone. And by establishing an objective evaluation method as proposed here as an algorithmic test of ethical fitness, we can proactively address the ethics of AI decision making. By enabling the system to automatically identify and avoid perverse incentives and instrumental convergence, such behavior in artificial systems can be effectively guarded against.
The scoring system as presented was devised to minimize algorithmic bias. This is to compel evaluation on merits alone or avoid falling back onto assumptions or biases where insufficient data to make such a choice exists. Each of those categories has been carefully selected as they are universally applicable in all sustainable political, social, or religious systems. Any that do not hold to those ideals are ultimately self-destructive or intentionally oppressive.
Conclusion
I do not expect that this is a complete theory in its current form. However, history shows a clear trend of ethical failures correlating with instrumental convergences. Also, with the expectation that AI will be relied on for a growing number of processes, there is a clear concern that, for the sake of programmatic success, the system will succumb to its own perverse incentives.
Although this thesis was developed to help humans identify bad decisions, I now believe this is even more relevant as we seek to automate decision making processes.
I welcome all opinions, insights, and suggestions for hypotheses to analyze using this process.
Below will be calculations on various topics like the ones discussed above and any others my readers may suggest: