One year after Anthropic launched Claude Code, an open-source project called OpenClaw showed what happens when you take that capability and point it at your entire digital life. OpenClaw is a local AI agent framework that runs on your machine, orchestrating models like Claude and OpenAI to act on your behalf. It has persistent memory. It learns skills. It reads and writes your files, manages your email, browses the web, runs scripts, and controls apps. Federico Viticci of MacStories burned through 180 million tokens and said there’s “no going back after wielding this kind of superpower.”
OpenClaw is what compounding innovation looks like in practice. Claude Code made it possible. MCP gave it a protocol to understand systems. And someone built an autonomous agent that does what enterprises have spent millions trying to build. All done via an open-source project, running on a Mac Mini.
The promises of AI aren’t theoretical anymore. Neither are the risks.
Researchers found hundreds of trojan skills on OpenClaw’s marketplace that installed keyloggers and stole credentials. The agent often runs with near-unlimited permissions. If it encounters a malicious prompt hidden in an email, it follows it. As some security practitioners have warned, giving an agent broad permissions can feel like hiring an employee and handing them the keys to your digital kingdom on day one.
The technology works. That was never the real question. The real question is whether your organization can manage what it’s about to unleash.
Your AI Strategy Has a People Problem It Doesn’t Know About
MIT estimates that 95% of AI projects fail. The first reactions were to blame the technology that it was not ready, it hallucinates, it can’t handle real complexity. But OpenClaw proves the capability is already here. Microsoft CTO Kevin Scott calls it a “capability overhang”, today’s AI systems contain far more capability than most applications draw upon.
The failure isn’t in the models. It’s in the context.
Everyone can relate to the concept of context as we process and understand the world through our own context. In business we managed context by creating a job description, hiring an employee, giving them access to specific systems, aligning their work to the organization’s goals, and measuring their contribution to the value network. When an employee struggled, it was usually because their personal context and work output didn’t align with the value the enterprise delivered to its broader network of customers, partners, and regulators. We didn’t hand a new hire the keys to every system on day one and say “figure it out.” We governed their access. We managed their context.
Agents are no different. An agent with powerful capabilities but misaligned context will optimize while undermining the organization’s position in its value network. Research is starting to validate this. The Vending-Bench paper tested autonomous agents on long-running tasks and found that with misaligned context or goals, agents drifted off course and would adjust pricing without understanding partner relationships, automating compliance without jurisdictional nuance. Context management isn’t just about data access. It’s about aligning the agent to the value the enterprise exists to deliver.
Capability without context alignment isn’t automation. It’s entropy.
Agents need the same discipline. An agent’s context is the combination of the data it can access, the goals it’s given, the tools it can use, and the memory it accumulates. That context determines whether the agent creates value or destroys it. And right now, most organizations are focused on selecting which AI model to deploy when the variable that actually determines success is how they manage the context around it.
AI projects don’t fail because the technology doesn’t work. They fail because nobody managed the context.
As Kevin Scott put it: “Some of the things that you need to do to squeeze the capability out of these systems is just ugly-looking plumbing stuff.” Context management is that plumbing. It’s not glamorous. But it’s the difference between the 95% that fail and the 5% that don’t.
Two Data Supplies. One Context Window. No Label.
When I think about where context for AI can move fast versus where it will struggle, there’s a distinction that keeps coming back: is the system leveraging internal data, external data, or some hybrid of both. That becomes a key part of the ability for an organization to quickly adopt these new tools and determine whether to buy or build the AI agent.
External data examples are tax codes, legal case law, regulatory requirements which are managed and maintained by someone else. You are starting to see startups and existing companies leaning into this space. They are curating and grounding external data to add value to existing professional roles. One example of large amounts of external data is the Legal Industry. There is a huge amount of case law that exists but the hard work is needed to ground any AI on top of it in order to avoid hallucinating cases or returning a defense as the court ruling. These are solvable problems. The data is findable and structured.
Internal data is another animal. It’s unique to the organization and it may not be clean. Organizations right now are approaching it as, “let’s go through a process of cleaning the data, getting it ready, and then we’ll move forward with AI.” That cleanup work is not trivial. It’s months. Sometimes years of work.
When both data types feed into an agent’s single context window, the lower governance standard wins. An agent pulls a revenue figure from your ERP and enriches it with a market benchmark from a third-party API. If the external data is stale or wrong, your agent just made a confident recommendation built on a cracked foundation. Without the hard work of structuring and labeling the data, inside the context window all data looks the same.
Your agent doesn’t see ‘internal’ and ‘external.’ It sees context.
The Three Fracture Points Between a Demo and a Deployment
The Q&A-to-Action Spectrum.
Finance departments recognize that AI can handle some of their reporting and analysis, especially now empowered through MCP enabled ERPs. But what they’re trusting falls on one side of a specific line. They are essentially performing Q&A summarization and retrieval which can tolerate messier data. It’s pattern matching, finding needles in haystacks. But the moment you allow an agent to act: create records, update systems, trigger workflows, you need a level of data integrity most organizations don’t have yet. A wrong answer in a Q&A session is a retry button. A wrong action is an audit finding.
The skill persistence trap.
The ability for an agent to learn a skill and not forget it will become critical. An agent that moves between systems through protocols like MCP, accumulating institutional knowledge, becomes more valuable over time. The same way consultants who have spent years in the same industry or on the same account become invaluable because they accumulated context. As my old coach used to say, practice makes permanent. Perfect practice makes perfect execution. If an agent ingests a bad external skill early, that error compounds. Every subsequent action carries the inherited error forward. An agent that learns from bad data doesn’t get smarter. It gets confidently wrong.
The fictitious data tell.
Fredrik Saetre recently did a video, #VibeConfig Live Stream, showing building a finance supply chain system in an hour using natural language connected to the MCP of Dynamics 365 ERP. It worked. And it worked because the context was clean, fictitious data with no quality issues, no legacy edge cases, no regulatory jurisdictions to navigate. The demo didn’t fail. It proved exactly what Kevin Scott is saying: the capability is there. It also proved that context is everything. Control the context and the system shines. Lose control of it and you’re back to the 95%.
The demo worked because the context was controlled. Production never is.
What Leaders Should Do Now
1. Audit your current AI initiatives against context governance and kill the ones that don’t have it.
If a project can’t answer “what data enters this agent’s context and who validates it,” the project aligns to the 95% of projects that will struggle to deliver enterprise value. Continuing to fund it as-is doesn’t make it more likely to succeed. It makes the failure more expensive.
2. If your agent can write to a system of record today, pull that access by Friday.
Restore it only after you have established validation gates, audit trails, and human-in-the-loop checkpoints for every write operation. Every day an agent has ungoverned write access is a day you are accumulating risk you can’t see and may not discover until an audit finds it for you.
3. Put context governance on your next board agenda not as an IT update, but as a risk item.
If your board is reviewing AI strategy without reviewing how agent context is managed, governed, and audited, they’re reviewing a capability story without the risk story. The 95% failure rate is not a technology statistic. It’s a governance and adoption absorption statistic. Treat it like one.
OpenClaw showed what’s possible when an individual gets full context and an agent that learns. Users are doing things they didn’t previously think possible because the context got richer.
Kevin Scott is right that the capability overhang in the existing models is real. And OpenClaw also showed what happens when you give the existing models unlimited control and vague context.
The hard work is the governance of data, the governance of tools, and the discipline to manage an agent’s context the way you manage the context of your newest employee. The organizations that win won’t be the ones that deployed first. They’ll be the ones that managed the context well enough to trust what their agents delivered.
AI doesn’t need good data to run. It needs governed data and managed context to be trusted.