|
9/25/2025
AGI by 2030: Plausible or Probable?In my fourth post of our AI Series, I share my critical response to Google DeepMind’s “An Approach to Technical AGI Safety and Security” published in April 2025 that famously predicts Artificial General Intelligence (AGI) will be here by 2030. By, Bill Miller TL;DR
Introduction This essay responds to Google DeepMind’s 145‑page roadmap, “An Approach to Technical AGI Safety and Security” (April 2025). In that document, the authors describe artificial general intelligence (AGI)-systems that can match or exceed capable humans across most economically relevant tasks-as “plausible” on a near‑term horizon, widely reported as around 2030. Their case rests on the idea that progress can compound across many improvement channels at once, not just by making the core model bigger. In DeepMind’s framing, these “many levers” include: better and broader training data; multimodal training (text, images, audio, video); tool‑use and agentization (letting models call software tools or services); retrieval and memory (looking up information instead of memorizing it); and systems‑level engineering plus evaluation‑driven training. Together with continued growth in computation and algorithmic efficiency, they argue these channels can plausibly add up to AGI within the decade. My thesis is that, even with rapid progress, “plausible by 2030” overstates what today’s trajectory can credibly deliver. The apparent “many levers” are tightly correlated: they draw on the same physical, economic, and scientific bottlenecks. When constraints move together, the number of effective levers shrinks. Several remaining gaps also look less like engineering polish and more like unsolved research problems that historically yield to unpredictable breakthroughs, not to steady extrapolation. Up-front clarification
Why the "many levers" collapse in practice 1) Chips and electricity are the binding pair Power and siting. The best available public projections show data‑centre electricity demand roughly doubling by 2030. Grid operators in several regions report long interconnection queues and local constraints. This is the same electricity you’d use to run longer contexts, more tools, heavier oversight-all of it. Advanced packaging and memory. Scaling any frontier stack depends on high‑bandwidth memory (HBM) and advanced packaging to feed models quickly. Even after announced expansions, industry leaders continue to describe packaging as a bottleneck. If you can’t package HBM stacks fast enough, you can’t deploy the compute that every other lever presumes. 2) Data quantity and quality are a shared ceiling Independent analyses estimate that the stock of high‑quality public human text will be substantially tapped within the decade if current scaling continues. That pushes developers toward synthetic data-models learning from model‑generated text-which carries documented risks of “model collapse” unless mitigated with careful curation and mixing. This single constraint pinches “data quality,” “multimodal coverage,” and “evaluation‑driven training” at once. 3) Long context isn’t long‑range reasoning Stretching context windows does not, by itself, produce reliable long‑range reasoning. Empirical work shows models often fail to use information buried mid‑context (“lost in the middle”). External retrieval helps, but it brings latency, correctness, and security trade‑offs-and, again, more power. 4) Agentization raises the bill of materials Turning assistants into agents that browse, click, run code, read logs, or modify configurations can amplify capability-but also expands the attack surface and operating cost. Mainstream security guidance now treats prompt injection, insecure output handling, and training‑data poisoning as first‑class risks, which require isolation, supervision, and logging at scale-in other words, more compute and more careful engineering. Where breakthroughs-not “more of the same”-are likely required
Why progress feels smooth when it’s actually stacked S‑curves Recent leaps in long‑horizon performance look “super‑exponential” because we have stacked discrete breakthroughs on top of raw scaling:
A brief note on Bostrom’s “Superintelligence” (2014)
Nick Bostrom’s book “Superintelligence: Paths, Dangers, Strategies” (2014) profoundly shaped policy and public discourse. It popularized ideas such as the “orthogonality thesis” (high intelligence does not imply benevolent goals), “instrumental convergence” (power‑seeking and resource acquisition as common means), and “fast takeoff” via recursive self‑improvement. That framing helped launch the safety and governance conversation a decade ago-a major contribution. With ten more years of evidence, several assumptions behind the most dystopian forecasts warrant updating. Capability gains to date have come from scaling compute, data, and algorithms-not from systems rapidly rewriting themselves into runaway superintelligences. Today’s agents are brittle and heavily scaffolded; serving frontier capability at population scale is constrained by electricity, high‑bandwidth memory, advanced packaging, and network capacity; and evaluation and oversight remain noisy and gameable. In other words, the empirical picture looks less like a single, explosive “takeoff” and more like stacked S‑curves under shared physical and economic ceilings. Bostrom’s central caution-that misaligned, powerful systems could cause catastrophic harm-still deserves attention; but the most credible risks on near‑to‑mid‑term horizons arise from concentrated access, misuse by capable actors, and socio‑technical integration failures, rather than an instantaneous, unconstrained leap to superintelligence. A note on the “7‑hour task” claims OpenAI’s recent Codex update reports an agentic coding model that worked independently for more than seven hours on large software tasks and ultimately delivered a successful implementation. That is impressive-but without denominators (success rate, lines of code, defect density, human‑equivalent time), “7 hours” is an operational anecdote, not a guarantee of robust autonomy on open‑ended work. It shows potential for long‑running agent loops, not that the hard parts of planning, error recovery, and verification are solved. What this implies for “2030 is plausible” DeepMind’s case assumes inputs-computation, data, and algorithmic efficiency-keep scaling and that many levers (agents, retrieval, systems engineering, evaluation‑driven training) compound in parallel. But those levers draw on the same wells: electricity, advanced packaging and memory, high‑quality data, and human oversight capacity. As those constraints tighten together, the effective degrees of freedom drop. A simple intuition pump: suppose five improvements must arrive by 2030-bigger base models, better data, long‑context that actually works, robust agents, and stronger oversight. If each had an 80% chance in isolation, independent probability would be about one‑third. But these events are positively correlated through shared bottlenecks (chips, power, data, engineering). The joint odds are lower. That is why “plausible by 2030” is too strong as a forecast of incremental evolution from today’s methods-especially when history says real unlocks tend to be sporadic and unpredictable. The likely deployment reality-rationed first, then distilled If AGI arrives soon, it will be rationed-concentrated in a few labs and perhaps governments-because serving it at national or global scale will be limited by power, chips, and networking, not just software. What happens next is predictable: those labs will work to distill the capability into smaller, cheaper models to push costs down enough for mass markets. That path is real, but it assumes the breakthroughs appear and that the physical build‑out keeps up. Today’s technology is powerful-and under‑engineered into business All of this caution does not diminish the power of today’s systems. They are already transforming workflows. The practical problem is integration: tooling, security, and process design lag behind model capability. Rapid iteration creates uncertainty-startups can build toward a genuine market need only to be steamrolled by a frontier release a year later. Executives should pursue “no‑regrets” adoption: start with narrow, high‑ROI applications under strong guardrails, invest in data and process hygiene, and build the organizational muscle to adapt as the frontier shifts. Policy and strategy takeaways
DeepMind is right that there are multiple channels for progress beyond “make the model bigger.” But those channels share the same scarce resources and unresolved science. Electricity, advanced packaging and memory, high‑quality data, and secure agent operation are not side issues; they are the issues. Given those correlated constraints-and the historical fact that the crucial breakthroughs arrive unpredictably-“plausible by 2030” looks less like a neutral forecast and more like an optimistic scenario worth planning for, but unwise to bank on. References (selected)
Comments are closed.
|