FirstMile Ventures
  • Home
  • Approach
  • Team
  • Portfolio
  • Blog
  • Talent
PItch us
  • Home
  • Approach
  • Team
  • Portfolio
  • Blog
  • Talent

The FirstMile Blog
the latest in tech from the rockies to the rio grande

8/8/2025

Mind the Memory Gap: Making AI Truly Useful

 
AI can do remarkable things, but it has a big limitation: the “context window,” or memory barrier. This explains why it can suddenly “forget” what you just told it—and why human-like memory will take time, resources, and innovation. For now, AI is best viewed as a powerful, if sometimes forgetful, assistant—especially when designed with its memory limits in mind.​
​
By, Bill Miller
Picture

What Is the Memory Barrier?
Imagine you hire a consultant for a long-term project, but they only work for a week at a time. Each week, a new consultant joins, spends time reviewing documents and meeting the team, and just as they start to contribute effectively, they leave. The next consultant starts the process again—more time catching up, less time producing results.

That’s how a context window works. It’s the finite number of words (or “tokens”) the AI can keep in mind at once. When the limit is reached, older parts of the conversation drop out of scope, unless you reintroduce them. Each conversation is like a short-term contract: once it ends, the AI loses memory of the work.

How Large Are Today’s AI Memory Windows?
  • Mid-range models (GPT-4o): about 128,000 tokens (~90,000 words).
  • Advanced models (Claude Sonnet 4): 200,000 tokens (~150,000 words).
  • Leading-edge (GPT-4.1): 1 million tokens, or several full-length novels in one prompt.
  • Experimental: some prototypes claim 4–10 million tokens, though not in mainstream production.
Even at a million tokens, this is still short-lived compared to a human’s lifetime of knowledge and experience.

Why the Memory Barrier Matters
Stateless Sessions
Every new conversation starts from scratch. Without reloading prior information (which costs tokens), the AI can’t recall past discussions.
No Continuous Learning
Humans learn continuously; current AI models do not. They don’t update their internal parameters between sessions unless explicitly fine-tuned.

Growing Onboarding Costs
The more complex the project, the more of the AI’s context is consumed just “catching up,” leaving less room for producing new insights.


Turning Limits into Strengths: Retrieval-Augmented Generation (RAG)
Instead of loading an entire project’s history into every interaction, RAG systems store that history in external memory—such as vector databases or knowledge graphs—and retrieve only what’s needed at the right moment. This separation of “deciding to retrieve” from “actually retrieving” optimizes the model’s working memory for the problem at hand.

Advanced RAG Variants
Beyond basic vector retrieval, emerging solutions are blending retrieval with richer, more structured approaches. For example, BurstIQ—a FirstMile Ventures portfolio company—combines knowledge graphs, blockchain, and vector stores to create distributed, trusted, and verifiable data environments. This federated approach enables intelligent, state-aware retrieval for agentic AI and supports persistent, personalized interactions at scale. Other companies, such as Pinecone, Weaviate, Vespa, and Chroma, are pushing high-performance vector storage and retrieval, while graph-based RAG platforms like Neo4j and Kuzu enhance context with semantic relationships. Together, these innovations make practical AI memory feel far more persistent, effectively overcoming many of the constraints of the context window.


Agentic Orchestration Frameworks
Agentic orchestration tools turn AI into a collaborative team member, coordinating multi-step reasoning, tool use, and retrieval. Popular frameworks include LangChain, LlamaIndex, AutoGPT, Fixie, and Kamiwaza—the latter also a FirstMile Ventures portfolio company. These systems break complex problems into smaller steps, retrieve relevant context for each step, and then stitch results together into coherent outputs, mitigating attention issues that arise with very large contexts.

Parameter-Efficient Fine-Tuning (PEFT)
Methods like LoRA and adapters allow organizations to continuously update models with new data without retraining from scratch. This effectively “personalizes” the model to your project, improving long-term performance and context retention.

The Hardware Dimension
Scaling memory isn’t just about algorithms. Modern GPUs like NVIDIA’s B300 and AMD’s MI355 offer up to 288 GB of memory—far beyond earlier generations—allowing much larger context windows and parameter sizes. Coupled with retrieval strategies, these capabilities can feel close to limitless in practical applications.

Outlook
1–2 years:
  • RAG and agentic orchestration are widely adopted.
  • 1M+ token APIs become standard in many enterprise workflows.
3–5 years:
  • Models natively handle multi-million-token contexts.
  • Hybrid systems with built-in memory modules become common.
5–10 years:
  • Lifelong learning emerges, with models that update continuously and maintain project-specific knowledge indefinitely.

Why AI Won’t Replace Us—But Will Empower Us
Humans learn continuously, recall selectively, and plan strategically over decades. Today’s AI complements those strengths: it can reason quickly over large but bounded contexts, then hand off to us—or an orchestration layer—for long-term direction. Used well, it becomes a force multiplier, accelerating research, automating repetitive work, and expanding our creative reach.

The memory barrier is real, but not insurmountable. Through advanced RAG (including graph-based and distributed approaches), agentic orchestration framework platforms like Kamiwaza, a FirstMile Ventures portfolio company, and continuous fine-tuning, we can extend AI’s effective memory and make it a highly productive partner. The road to human-like, persistent AGI is still ahead, but with the right design choices today, AI can deliver immense value without replacing the humans who guide it.

Comments are closed.
FirstMile Ventures Logo
Learn more about our...
Approach
Team
​View our...
Portfolio
​Blog
Jobs
Follow us on...
© 2023 FirstMile Ventures. All rights reserved.