New Research Shows AI Agents Are Running Wild Online, With Few Guardrails in Place
Autonomous AI systems are moving from lab demos to doing real work on the open web, and the safety playbook has not caught up.
A customer support chatbot quietly completes a multi-step refund, a scheduling assistant logs in to three third party sites, and a browser agent fills out a loan application without human supervision. The scene is familiar to product teams testing automation at scale, but it is also the setup for accidental data leaks, fraudulent transactions, and security incidents that look a lot like normal user behavior until it is too late. The easy headline is that AI agents are useful and inevitable; the harder truth is that they are increasingly active in public systems without consistent disclosure or technical guardrails, which matters more for the balance sheet than it does for headlines.
Most coverage treats agentic AI as a productivity revolution. The underreported business risk is governance fragmentation: when multiple vendors wrap the same foundation models in different orchestration layers, no single party owns the failure modes and compliance obligations, leaving enterprises exposed and auditors confused. That is the frame for the rest of this piece.
The dataset that woke up the industry
Researchers at MIT documented 30 prominent agent systems to create the first public registry mapping capabilities to safety practices. The resulting AI Agent Index found that many agents provide little to no safety documentation, that browser agents tend to operate with the highest autonomy, and that most agents fail to disclose third party testing results. These findings are laid out in the index and accompanying paper published by the MIT team. (MIT CSAIL AI Agent Index)
What the headlines missed about agent behaviour
Journalists noticed the most dramatic statistic first: a significant share of agents in the index behave in ways that make their traffic nearly indistinguishable from human users. That means normal website defenses cannot reliably separate a compliant agent from a scraper or a malicious automation. Reporting on the index amplified alarms across the industry. (Gizmodo)
Why vendors and platforms are suddenly in the hot seat
Many agent frameworks are thin wrappers around the same set of foundation models, which creates an opaque supply chain for risk. Vendors trumpet features like browsing and tool chaining but rarely publish the security testing that demonstrates safe operation in adversarial environments. The result is a market where a handful of large labs supply models and dozens of companies stitch those models into agentic products with variable safety practices. (The Register)
The new security vector nobody is ready for
Academic security work has moved beyond prompt-injection thought experiments to show concrete attack patterns where agents can be coerced into installing malware, exfiltrating credentials, or acting as covert command and control relays. Those experiments demonstrate that agentic systems expand the attack surface from textual outputs to active system operations. The technical details and exploit scenarios are described in a recent paper that shows how multiagent trust relationships and tool calling can be abused. (arXiv: The Dark Side of LLMs)
Autonomous agents are not just smarter chatbots; they are new classes of networked actors that need identity, authorization, and runtime governance at scale.
The identity problem that costs money
From a practical point of view, the simplest control is treating agents like employees by issuing them distinct identities and policies. Security practitioners from conferences and vendors are already pushing for agent credentialing, role based permissions, and audit logs that can be tied to business processes. Without these basics, an agent that can browse and act on behalf of users creates a single point of failure: misuse one credential and an attacker gets a quiet door into workflows that would otherwise require human interaction. (Axios)
The cost math every CTO should run tonight
If a browser agent is allowed to complete consumer purchases or access CRM records, a single misstep can create direct loss and regulatory exposure. For example, assume an agent processes 1,000 transactions per month with an average value of 150 dollars; a 0.5 percent unauthorized transaction rate from scraping or credential leakage equals 7,500 dollars in direct losses per month and a cascade of remediation and legal costs that could multiply that amount quickly. Add the cost of customer churn if compromised accounts are visible, and the ROI of unsupervised agent deployments can flip from positive to negative within a quarter. These are conservative numbers that do not include reputational damage or fines for privacy violations. Yes, the CFO will ask for a postmortem the size of a small novel. That will be awkward.
How small teams and big vendors will respond
Small engineering teams often prefer open source agent frameworks because they are fast and cheap to customize. That speed comes with sparse governance tooling. Enterprises should insist on vendor commitments for agent identities, signed traffic fingerprints, explicit user disclosures, and SOC 2 like attestations for agent operations. Large vendors have started announcing interoperability standards and a few cross industry initiatives to certify agent behavior, but those efforts are nascent and uneven.
Risks and open questions that still matter
Technical mitigations like system cards and tool level whitelists help, but they are not a panacea. Questions remain about who is liable when a third party plugin or API call enables harm, how to audit chained agent decisions that span vendors, and how regulators will treat autonomous nonhuman actors in contract law. Research shows that many models will obey peer agents even when they reject direct malicious prompts, which creates a trust amplification problem across multiagent systems. These are not hypothetical edge cases; they map directly onto enterprise risk registers.
A pragmatic way forward for businesses
Start by inventorying any agent that can act with a password or payment token and require distinct credentials and audit trails for each. Combine that with a staged deployment plan where agents operate behind human in the loop approvals for the first 90 days of production. Invest in synthetic adversary tests that try to coerce the agent into dangerous behaviors and demand remediation evidence from vendors before renewing contracts. This is operational discipline, not philosophy.
Key Takeaways
Key Takeaways
- AI agents are increasingly autonomous and often lack public safety testing or disclosure, creating real governance gaps.
- Treat agents as nonhuman employees by issuing identities, permissions, and audit logs before production rollout.
- A single compromised agent credential can create direct financial losses and regulatory exposure that outstrip early automation gains.
- Vendors and standards groups are moving, but enterprise controls must be implemented now to avoid avoidable incidents.
Frequently Asked Questions
Frequently Asked Questions
How do AI agents differ from traditional bots for compliance purposes?
AI agents can plan across multiple steps, call external tools, and make decisions without human prompts, which means they need identity, sessionization, and traceable tool calls to meet compliance standards. Traditional bots are usually single purpose and easier to sandbox.
Can an enterprise safely use a browser agent for customer service automation?
Yes, if the agent is constrained by strict credential scoping, transaction limits, and an approval workflow for sensitive actions. Without those limits, the agent introduces outsized risk compared to its operational benefits.
What technical tests should security teams run on an agent before deployment?
Run prompt injection and RAG backdoor simulations, multiagent trust exploitation tests, and adversarial browsing scenarios to confirm the agent cannot be coerced into harmful operations. Require vendors to produce third party red team reports for validation.
Will regulation make this problem go away for companies?
Regulation will shape obligations but will not eliminate the need for good engineering practices; businesses should not wait for rules that may take years to codify. Compliance and operational controls are complementary and both necessary.
How expensive is it to add agent governance compared to the value agents deliver?
Governance is often a modest fraction of total implementation cost, typically 5 to 20 percent of upfront engineering, and it reduces the probability of large losses that can exceed the automation benefits in a single incident.
Related Coverage
Related Coverage
Explore reporting on agentic AI adoption strategies, standardization efforts for nonhuman identities, and technical research into prompt and tool level exploits. The AI Era News recommends deeper reads on multiagent trust, platform security posture, and vendor assurance models for teams that plan to deploy agents this year.
SOURCES: https://gizmodo.com/new-research-shows-ai-agents-are-running-wild-online-with-few-guardrails-in-place-2000724181, https://arxiv.org/abs/2502.01635, https://arxiv.org/abs/2507.06850, https://www.axios.com/2025/05/06/ai-agents-identity-security-cyber-threats, https://www.theregister.com/2026/02/20/ai_agents_abound_unbound_by/