Consolidation mechanisms for AGI labs should be connected to AI capabilities
TLDR
AGI companies should have coordination points for consolidating that are directly tied to the capabilities of AI systems, specifically the ability for AI systems to dramatically decrease the chance of catastrophe from misaligned AGI.
Context
OpenAI’s charter says “if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project.” This is a good sentiment, but it’s poorly defined. One of the more glaring issues with this clause is that the bar for what people count as “AGI” is vague, even per Sam Altman himself.1 In particular, I expect the bar to generally raise so that e.g., 2030 humans are dealing with AI systems that they label as “not AGI”, but back in 2023 we would have thought such a system counts. I want AGI companies to be able to coordinate in order to prevent existential catastrophe from misaligned AGI, but there may not be obvious default coordination points.
Alaga and Schuett recently released “Coordinated pausing: An evaluation-based coordination scheme for frontier AI developers” in which they discuss how AGI companies could plan to pause frontier AI development when a model fails safety evaluations. I think this is an exciting idea and likely one that both AI labs and governments should be pursuing, with the goal of getting voluntary commitments in the interim and binding international regulation eventually. Other related work includes Anthropic’s Responsible Scaling Policy which outlines procedures to pause scaling if certain safety criteria are not met, and “Model evaluation for extreme risks” which lays groundwork for these evaluations.
In general, recent work has mainly been concerned with pausing AI development based on the existence of dangerous capabilities, e.g., autonomous replication, use for biological weapon development, and cyberattacks. It seems desirable to pause scaling before developing and deploying models that have extremely dangerous capabilities.
Two coordination points: Pausing and Consolidation
It might be a useful model to think about AGI labs as having two responsibilities: first you coordinate to prevent destroying the world yourself, and then you coordinate to save the world from others destroying it.2
The first of these coordination points is likely a pause or series of pauses in frontier development — like those discussed above — with the primary purpose of avoiding the imminent deployment of a system that is likely to cause significant harm, from a coordinating lab. The second coordination point is about substantially reducing the chance of misaligned AGI being deployed at all, and it benefits substantially from the consolidation of AGI development. Both of these may require multiple steps (e.g., multiple pauses for safety, incremental consolidation) rather than being a single point, so they may be better thought of as classes of coordination.
I’ll flesh out this second coordination point in more detail. Even if responsible AI developers are able to coordinate and pause to avoid deploying catastrophically dangerous AIs, other actors will soon (likely months to years later) be able to develop these systems too, and may not be willing to pause. At this point, I expect numerous significant changes are likely needed to move the world to an existentially secure state, where there is almost no risk of the deployment of misaligned AGI. This second coordination point is an opportunity for leading AGI labs to merge in order to tackle these difficult problems with increased resources and decreased race pressure. Here are some example projects that this merged lab may tackle:
Massively scale AI alignment research using automated researchers, aiming to develop techniques that are likely to work for superintelligence alignment
Massively scale AI alignment research using enhanced humans (e.g., brain-computer interfaces, whole brain emulations)
Work with governments around the world to institute a compute monitoring regime which makes it nearly impossible for anybody to train misaligned AGI
Set up a global surveillance regime to the extent needed to prevent the training of misaligned AGI
Acquire other major AI development projects and their resources through a combination of ideally standard (e.g., acquisition), but potentially extreme (e.g., cyberattack) measures
In 2023 it’s hard to know what will be needed to secure the world. Hopefully this can be done through legally and ethically sound strategies, but the merged lab should be open to violating these constraints (e.g., surveillance, cyberattacks, skipping IRB approval) if that’s what it takes to succeed — we should all be willing to go to significant lengths to prevent human extinction. Such decisions should not be made lightly, as they may come at significant costs.
Consolidation of the AGI labs in order to accomplish these projects is mainly useful because it reduces competitive race pressures between them, allowing them to move more slowly and focus on safety. It also gives the merged lab substantial resources to invest into these projects, increasing their chance of success (and increasing the amount that can be done before resorting to less desirable methods).
Consolidation schelling points
Current conversations focus on how pausing should be related to hitting dangerous capability thresholds. I posit that consolidation should be related to hitting other, positive, capability thresholds. In particular, these thresholds should be based on the desirable projects that a merged lab may want to tackle, with the caveat that there may be other triggers for consolidation, such as imminent risks that can be better tackled with a merged lab.
Beyond indicating the ability to solve the above projects, a consolidation schelling point — a default place to coordinate and decide to consolidate — should be low enough that it is reached when it needs to be. That is, if this point is further along the tech development tree than much of the existential risk, this mechanism would not be so useful for preventing such risks. While our actions affect what areas of the tree are explored and unlocked, there may be immutable facts about the tree which make the situation difficult (e.g., it may be a fact about the tree that a majority of paths pass through danger before they get to useful and safe states).
Some potential schelling points here are: a lab has a large cybersecurity lead over other labs such that it could steal their IP and significantly hurt their development if it wanted; solving a challenging problem, like providing a proof or disproof of the Riemann Hypothesis; achieving a whole brain emulation/upload; automating 95% of internal AI R&D (i.e., making progress 20x faster than in 2023).
A notable example of a consolidation schelling point is a lab 100x-ing their alignment research output via automated researchers (perhaps compared to 2023, or compared to the previous month, I don’t pretend to know what particulars would be best). This is very likely a point where other labs should subsume themselves and contribute to these efforts. Unfortunately, this capability level may be too high. My current guess is that if the specific metric is “100x more quality-adjusted AI alignment research in month n than month n-1,” we are unlikely to see these kinds of jumps before we face significant existential risk from misalignment.
An initial way to approach deciding on the specifics would be to venture guesses for how much I would want my future self to consolidate given different thresholds are hit, for instance: if 100x alignment research improvement in one month over the previous month, 99% merge; if 30x month to month output, 95% merge; if 5x month to month output, 60% merge; if 50% improvement in one year over the previous year, 10% merge; if 300% year to year improvement, 25% merge. These are not supposed to be calibrated probabilities so much as to suggest an ordering of how strong the case for merging seems in different situations, from the current outlook. The main failure modes to avoid are defining too easy a bar (and thus running into the “we have this capability that people would have said was AGI a decade ago, but this definitely doesn’t seem like AGI” problem) and defining a bar so high that it doesn’t get hit before it’s too late.
The world where this matters a lot
It’s a somewhat common belief among people I talk to that AGI developers are going to pause when we start getting seriously dangerous capabilities (e.g., biological weapon development abilities). I’m somewhat hopeful that this happens, but I also want to prepare for a world where we approximately blow past these capabilities.
These dangerous capabilities are, as far as I know, the current schelling point for pausing. If we blow right past them, then there are no more coordination points. It seems pretty reasonable (>10%) to expect that we’ll face most of the catastrophic misalignment risk fairly far into capabilities advances (e.g., after human-level AIs), which would make it more likely for us to blow past the obvious dangerous capability thresholds — if risks of misalignment are minimal, it is easier to develop and deploy powerful systems because you have one fewer threat to worry about. The safety-based coordination points discussed recently are a good idea, but we should also have a contingency plan for if they are obsoleted.
Concisely, we might be in a world where we, in effect, blow past the dangerous capability thresholds because there don’t appear to be major misalignment issues in our models, and misuse risks can be patched. If that is the case, the safety-based coordination points currently being discussed in AI Governance are no longer relevant.
Conclusion
It’s pretty unclear what’s going to happen in (maybe) 2-3 years when we start hitting seriously dangerous capabilities, but I think it’s reasonable to prepare for a future where AI capabilities continue to advance without us dying immediately. Given that world, we are in need of coordination points. I’ve attempted to lay out one specific type of coordination, consolidation, that may be particularly useful for solving key problems in existential security.
Potential issues
AGI labs may not want to merge. Yeah, they might not, and there might be losers from merging, but I expect this happens late enough in the game that all the AGI companies are “winners” and are willing to put humanity’s survival first. I’m not particularly confident here, but I’m willing to spend optimism points on it.
Legal barriers to merging. I’m also fine spending optimism points here. I think it’s plausible that governments see the risks and endorse consolidation.
Consolidation speeds things up. Yeah, that’s kinda what we’re aiming for, in that we want safety projects to be sped up. Consolidation that doesn’t include the leading actor (e.g., the #2 and #3 AGI labs merging) is probably bad because it furthers racing.
Consolidated power seems dangerous. Yep, it sure does. It also seems like the only feasible path out of this mess. I would be excited if there existed a better way, but I currently do not expect there is. There should obviously be significant effort put into making the merged AGI project work toward the benefit of all of humanity and share its spoils.
International coordination may be difficult. Yep, it sure might, though I’m not sure it gets any worse with or without plans for coordination.
Random
Once existential security is achieved, facilitating a positive transition to a post-AGI and post-work world could be a difficult project. It seems like this may be better facilitated by a central governing body rather than by traditional economic markets. A merged AGI lab could be a good starting point to spin off a wing focused on this transition; for instance they might work on: curing diseases, offering a universal basic income to all humans, ensuring value is distributed across society, setting up legal processes for dealing with digital minds, becoming the new dominant structure in society organization (i.e., countries may not be a useful institution and may be replaced). I would be excited for more research on post-AGI governance.
I think there’s a legitimate worry that AGI labs will race toward a consolidation point, trying to get their first in order to force other labs to get acquired. There are a bunch of objectives that it would be bad for AGI labs to race toward, e.g., ability to synthesize a virus capable of extincting humanity. One convenient aspect of the consolidation schelling points is that they can be structured around goals we want the AI labs to race toward, like producing high quality alignment research. Imagine a world where all the AGI labs were racing to produce the best alignment research possible because the winner of an annual contest gets 5% of everybody else’s GPUs for the next year, or would get to absorb the other companies!
“it's not going to be such a crystal clear moment. It's going to be a much more gradual transition. It'll be what people call a “slow takeoff.” And no one is going to agree on what the moment was when we had the AGI.” Source
This is similar to the dichotomy of action and inaction risks discussed by Karnofsky, in which AGI labs have to balance risks from deploying their own catastrophically misaligned systems (action risks) with the risk that another actor may deploy such systems (inaction risk). Pauses can be thought of as reducing action risk, and when done in a coordinated manner, reducing inaction risk as well. Consolidation is primarily a way to reduce inaction risk by reducing the number of actors that could unilaterally develop or deploy misaligned AGI, and it may benefit action risks by allowing more resources to go into safety per development. Many of the things I want a merged AGI lab to do come with significant action risk, but the reduced race pressures from fewer actors will hopefully be sufficient to allow this merged lab to make good decisions.