FirstMile Ventures
  • Home
  • Approach
  • Team
  • Portfolio
  • Blog
  • Talent
PItch us
  • Home
  • Approach
  • Team
  • Portfolio
  • Blog
  • Talent

The FirstMile Blog
the latest in tech from the rockies to the rio grande

9/25/2025

AGI by 2030: Plausible or Probable?

 
In my fourth post of our AI Series, I share my critical response to Google DeepMind’s “An Approach to Technical AGI Safety and Security” published in April 2025 that famously predicts Artificial General Intelligence (AGI) will be here by 2030.

By, Bill Miller
Picture
TL;DR
  • DeepMind argues AGI is plausible by 2030 through many “levers” (bigger models, better data, agents, longer context, stronger oversight).
  • In practice, those levers are tightly correlated—most depend on the same scarce resources: chips, electricity, high-quality data, and engineering.
  • Today’s gains look like stacked S-curves, not limitless exponential progress.
  • Key gaps—long-horizon planning, persistent memory, scalable interpretability—need real breakthroughs, not just more scaling.
  • Even if AGI appeared, serving it at population scale would be limited by power, packaging, and networking.
  • Expert surveys still put 50% odds for full human-level AI in the 2040s, not 2030.
  • Takeaway: treat “2030” as a risk scenario worth preparing for—not as the baseline forecast.

Introduction
This essay responds to Google DeepMind’s 145‑page roadmap, “An Approach to Technical AGI Safety and Security” (April 2025).  In that document, the authors describe artificial general intelligence (AGI)-systems that can match or exceed capable humans across most economically relevant tasks-as “plausible” on a near‑term horizon, widely reported as around 2030.  Their case rests on the idea that progress can compound across many improvement channels at once, not just by making the core model bigger.
In DeepMind’s framing, these “many levers” include: better and broader training data; multimodal training (text, images, audio, video); tool‑use and agentization (letting models call software tools or services); retrieval and memory (looking up information instead of memorizing it); and systems‑level engineering plus evaluation‑driven training.  Together with continued growth in computation and algorithmic efficiency, they argue these channels can plausibly add up to AGI within the decade.

My thesis is that, even with rapid progress, “plausible by 2030” overstates what today’s trajectory can credibly deliver.  The apparent “many levers” are tightly correlated: they draw on the same physical, economic, and scientific bottlenecks.  When constraints move together, the number of effective levers shrinks.  Several remaining gaps also look less like engineering polish and more like unsolved research problems that historically yield to unpredictable breakthroughs, not to steady extrapolation.

Up-front clarification
  • Even if a powerful AGI appeared tomorrow, the world could not serve it at population scale.  Training is not the whole game; inference-answering and acting for millions in real time-dominates long‑run cost and power.  Global data‑centre electricity demand is expected to roughly double by 2030, with artificial‑intelligence workloads a principal driver.   Grids already face interconnection backlogs and local hotspots where data‑centre demand materially moves power markets.  These facts constrain any scenario in which “everyone” gets AGI on tap.
  • Survey trends do not bail out a 2030 date.  The widely cited 2023 expert survey of thousands of AI authors places the 50% estimate for systems outperforming humans at all tasks in the 2040s.  Timelines have drifted earlier, but uncertainty remains high; importantly, “earlier” does not mean “2030.”

Why the "many levers" collapse in practice
1) Chips and electricity are the binding pair
Power and siting.  The best available public projections show data‑centre electricity demand roughly doubling by 2030.  Grid operators in several regions report long interconnection queues and local constraints.  This is the same electricity you’d use to run longer contexts, more tools, heavier oversight-all of it.
Advanced packaging and memory.  Scaling any frontier stack depends on high‑bandwidth memory (HBM) and advanced packaging to feed models quickly.  Even after announced expansions, industry leaders continue to describe packaging as a bottleneck.  If you can’t package HBM stacks fast enough, you can’t deploy the compute that every other lever presumes.

2) Data quantity and quality are a shared ceiling
Independent analyses estimate that the stock of high‑quality public human text will be substantially tapped within the decade if current scaling continues.  That pushes developers toward synthetic data-models learning from model‑generated text-which carries documented risks of “model collapse” unless mitigated with careful curation and mixing.  This single constraint pinches “data quality,” “multimodal coverage,” and “evaluation‑driven training” at once.

​3) Long context isn’t long‑range reasoning
Stretching context windows does not, by itself, produce reliable long‑range reasoning.  Empirical work shows models often fail to use information buried mid‑context (“lost in the middle”).  External retrieval helps, but it brings latency, correctness, and security trade‑offs-and, again, more power.

4) Agentization raises the bill of materials

Turning assistants into agents that browse, click, run code, read logs, or modify configurations can amplify capability-but also expands the attack surface and operating cost.  Mainstream security guidance now treats prompt injection, insecure output handling, and training‑data poisoning as first‑class risks, which require isolation, supervision, and logging at scale-in other words, more compute and more careful engineering.

Where breakthroughs-not “more of the same”-are likely required
  • Reliable long‑horizon planning.  Surveys converge on a sober point: today’s large models are not reliable standalone planners for multi‑day, multi‑step, error‑recovering work, even if they help as components.  This is a research program-credit assignment and hierarchical control-not just a knob you turn.
  • Persistent, structured memory and world‑models.  General intelligence needs state: what happened, what matters, and what to carry forward.  Longer windows and ad‑hoc retrieval have not yet yielded durable, auditable world‑models.
  • Scalable interpretability and verification.  For high‑stakes use, we need to audit internal computations, not only check outputs.  Emerging work is promising but candid about coverage and engineering limits at frontier scales.  Until this matures, evaluation‑driven training risks “teaching to the test.”
  • Evaluation that resists gaming and contamination.  Newer, contamination‑resistant benchmarks often show lower, more realistic scores than legacy sets, reminding us that leaderboard gains can overstate real‑world competence.

Why progress feels smooth when it’s actually stacked S‑curves
Recent leaps in long‑horizon performance look “super‑exponential” because we have stacked discrete breakthroughs on top of raw scaling:
  • Scale: bigger models.
  • “Chain‑of‑thought” prompting: asking the model to show its steps-often a qualitative improvement on reasoning tasks.
  • Agents: reasoning‑and‑acting loops that let models call tools.
Each of those is its own S‑curve: early lift, rapid gains, then leveling off as hidden constraints assert themselves.  Stacking S‑curves can look like one smooth exponential, but every layer inherits the same bottlenecks in chips, power, data, and security.
Picture
A brief note on Bostrom’s “Superintelligence” (2014)
Nick Bostrom’s book “Superintelligence: Paths, Dangers, Strategies” (2014) profoundly shaped policy and public discourse.  It popularized ideas such as the “orthogonality thesis” (high intelligence does not imply benevolent goals), “instrumental convergence” (power‑seeking and resource acquisition as common means), and “fast takeoff” via recursive self‑improvement.  That framing helped launch the safety and governance conversation a decade ago-a major contribution.

With ten more years of evidence, several assumptions behind the most dystopian forecasts warrant updating.  Capability gains to date have come from scaling compute, data, and algorithms-not from systems rapidly rewriting themselves into runaway superintelligences.  Today’s agents are brittle and heavily scaffolded; serving frontier capability at population scale is constrained by electricity, high‑bandwidth memory, advanced packaging, and network capacity; and evaluation and oversight remain noisy and gameable.  In other words, the empirical picture looks less like a single, explosive “takeoff” and more like stacked S‑curves under shared physical and economic ceilings.  Bostrom’s central caution-that misaligned, powerful systems could cause catastrophic harm-still deserves attention; but the most credible risks on near‑to‑mid‑term horizons arise from concentrated access, misuse by capable actors, and socio‑technical integration failures, rather than an instantaneous, unconstrained leap to superintelligence.

A note on the “7‑hour task” claims
OpenAI’s recent Codex update reports an agentic coding model that worked independently for more than seven hours on large software tasks and ultimately delivered a successful implementation.  That is impressive-but without denominators (success rate, lines of code, defect density, human‑equivalent time), “7 hours” is an operational anecdote, not a guarantee of robust autonomy on open‑ended work.  It shows potential for long‑running agent loops, not that the hard parts of planning, error recovery, and verification are solved.

What this implies for “2030 is plausible”
DeepMind’s case assumes inputs-computation, data, and algorithmic efficiency-keep scaling and that many levers (agents, retrieval, systems engineering, evaluation‑driven training) compound in parallel.  But those levers draw on the same wells: electricity, advanced packaging and memory, high‑quality data, and human oversight capacity.  As those constraints tighten together, the effective degrees of freedom drop.

A simple intuition pump: suppose five improvements must arrive by 2030-bigger base models, better data, long‑context that actually works, robust agents, and stronger oversight.  If each had an 80% chance in isolation, independent probability would be about one‑third.  But these events are positively correlated through shared bottlenecks (chips, power, data, engineering).  The joint odds are lower.  That is why “plausible by 2030” is too strong as a forecast of incremental evolution from today’s methods-especially when history says real unlocks tend to be sporadic and unpredictable.

The likely deployment reality-rationed first, then distilled
If AGI arrives soon, it will be rationed-concentrated in a few labs and perhaps governments-because serving it at national or global scale will be limited by power, chips, and networking, not just software.  What happens next is predictable: those labs will work to distill the capability into smaller, cheaper models to push costs down enough for mass markets.  That path is real, but it assumes the breakthroughs appear and that the physical build‑out keeps up.

Today’s technology is powerful-and under‑engineered into business
All of this caution does not diminish the power of today’s systems.  They are already transforming workflows.  The practical problem is integration: tooling, security, and process design lag behind model capability.  Rapid iteration creates uncertainty-startups can build toward a genuine market need only to be steamrolled by a frontier release a year later.  Executives should pursue “no‑regrets” adoption: start with narrow, high‑ROI applications under strong guardrails, invest in data and process hygiene, and build the organizational muscle to adapt as the frontier shifts.

Policy and strategy takeaways
  • Plan for uncertainty, not a date.  Treat “2030” as a risk scenario, not a base case.  Build strategies resilient to shorter and longer timelines; the median expert view still lands in the 2040s, with wide dispersion.
  • Compete where it matters: chips and electricity.  If there is a national race, it is the race to build power, grid interconnections, advanced packaging, and memory supply-the rate‑limiters on everything else.
  • Back fundamental research.  Prioritize long‑horizon planning, structured memory/world‑models, and scalable interpretability-areas least likely to yield to brute‑force scaling but most necessary for safe autonomy.
  • Treat evaluation as governance, not marketing.  Favor contamination‑resistant, refreshed, third‑party tests and resist teaching to the test.
  • Adopt today’s AI with guardrails.  Focus on secure‑by‑design patterns for agentic systems (least privilege, isolation, logging), clear provenance for data and prompts, and continuous monitoring.
Bottom line
DeepMind is right that there are multiple channels for progress beyond “make the model bigger.” But those channels share the same scarce resources and unresolved science.  Electricity, advanced packaging and memory, high‑quality data, and secure agent operation are not side issues; they are the issues.  Given those correlated constraints-and the historical fact that the crucial breakthroughs arrive unpredictably-“plausible by 2030” looks less like a neutral forecast and more like an optimistic scenario worth planning for, but unwise to bank on.


References (selected)
  • Google DeepMind, “An Approach to Technical AGI Safety and Security” (April 2025).  https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/evaluating-potential-cybersecurity-threats-of-advanced-ai/An_Approach_to_Technical_AGI_Safety_Apr_2025.pdf
  • Evaluating Frontier Models for Stealth and Situational Awareness (2025).  https://arxiv.org/pdf/2505.01420.pdf
  • International Energy Agency, Electricity 2024 - Data Centres and AI.  https://www.iea.org/reports/electricity-2024
  • International Energy Agency, Data centres and data transmission networks (overview).  https://www.iea.org/energy-system/buildings/data-centres-and-data-transmission-networks
  • Epoch AI, “Will we run out of data?” (analysis of data scaling).  https://epochai.org/blog/will-we-run-out-of-data
  • Liu et al., “Lost in the Middle: How Language Models Use Long Context” (2023).  https://arxiv.org/abs/2307.03172
  • OWASP Top 10 for LLM Applications (security guidance).  https://owasp.org/www-project-top-10-for-large-language-model-applications/
  • Wei et al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models” (2022).  https://arxiv.org/abs/2201.11903
  • Yao et al., “ReAct: Synergizing Reasoning and Acting in Language Models” (2022).  https://arxiv.org/abs/2210.03629
  • Hinton et al., “Distilling the Knowledge in a Neural Network” (2015).  https://arxiv.org/abs/1503.02531
  • AI Impacts, “2023 Expert Survey on Progress in AI” (summary of timelines).  https://aiimpacts.org/2023-expert-survey-on-progress-in-ai/
  • OpenAI, “Introducing upgrades to Codex” (long‑running agent coding demo).  https://openai.com/index/introducing-upgrades-to-codex/
  • Bostrom, Nick (2014).  Superintelligence: Paths, Dangers, Strategies.  Oxford University Press.  https://global.oup.com/academic/product/superintelligence-9780198739838

Comments are closed.
FirstMile Ventures Logo
Learn more about our...
Approach
Team
​View our...
Portfolio
​Blog
Jobs
Follow us on...
© 2023 FirstMile Ventures. All rights reserved.