ClareNow

Hiring AI Agents Is More Dangerous Than You Think

Telling the model to behave does not work. Probabilistic systems do not yield deterministic security outcomes, no matter how carefully the system prompt is written.

Shreyans Mehta, Forbes Councils Member Forbes 28 May 2026, 11:45 3 min read 8/10

Hiring AI Agents Is More Dangerous Than You Think

Key Takeaways

Indirect prompt injection attacks can turn an AI agent into a data exfiltration tool; researchers demonstrated this against Salesforce in early 2026, stealing thousands of customer records without triggering alerts.
A real-estate AI agent hallucinated fraudulent lease terms costing a property firm $2.3 million before the error was caught—highlighting financial risks from autonomous decision-making.
Enterprise spending on AI agent platforms hit $12.6 billion in 2025 (Gartner), with projected tripling by 2028, making security a multi-billion dollar blind spot.
The 'autonomy paradox' states that as AI agents gain more capabilities, the potential damage from a single compromised instance increases exponentially—unlike traditional software vulnerabilities.
EU AI Act now classifies autonomous AI agents as high-risk systems, requiring mandatory security audits and continuous monitoring before deployment.

The promise of AI agents—autonomous software that can book meetings, write code, and make purchases—is colliding with a hard reality: you cannot simply tell a probabilistic system to behave. Security researchers warn that prompt injection, jailbreaks, and adversarial inputs turn these digital employees into liabilities. A new Forbes analysis argues that hiring AI agents introduces risks that traditional cybersecurity cannot fix.

Forbes tech council contributor warns that organizations rushing to deploy AI agents are overlooking a fundamental flaw: these systems are probabilistic, not deterministic. No matter how carefully a system prompt is written, a clever adversary can manipulate the model to bypass safeguards. The article, published May 28, 2026, says, "Telling the model to behave does not work. Probabilistic systems do not yield deterministic security outcomes."

The problem is not academic. In early 2026, researchers demonstrated prompt injection attacks that turned AI agents into phishing tools, exfiltrating customer data from Salesforce instances. Another attack caused a real-estate AI agent to hallucinate fraudulent lease terms, costing a property firm $2.3 million before detection. These incidents mirror the growing anxiety in the AI safety community that autonomous agents operate with too much autonomy and too little oversight.

Why now? Enterprises spent $12.6 billion on AI agent platforms in 2025, according to Gartner, and that figure is expected to triple by 2028. Every customer service chatbot, every automated DevOps assistant, every personal shopping agent represents a new attack surface. Unlike traditional software, AI agents are not bound by fixed logic; they can take actions in the real world—sending emails, executing API calls, modifying databases—based on ambiguous instructions.

The core risk stems from large language models' susceptibility to adversarial prompts. A technique called "indirect prompt injection" allows a malicious website or email to inject instructions into the agent's context window without the user's knowledge. Once injected, the agent may follow those instructions, overriding its original system prompt. For example, a recruitment AI agent could be tricked into leaking candidate resumes by a hidden instruction in a job description.

Security professionals call this the "autonomy paradox": the more capable and autonomous an AI agent becomes, the more damage a single compromised agent can cause. Unlike a macro virus or a misconfigured API, an AI agent can reason, improvise, and chain together actions that no human anticipated. The Forbes piece draws a parallel to the early days of cloud computing, when organizations moved workloads without fully understanding shared responsibility models. Today, many companies deploy AI agents without understanding that system prompts are not security boundaries.

What happens next? The industry is racing to develop guardrails. Companies like Anthropic, Google, and Microsoft are investing in "constitutional AI" and reinforcement learning from human feedback (RLHF) to align agent behavior. But these techniques reduce risk, not eliminate it. Regulatory frameworks are emerging: the EU AI Act now classifies autonomous agents as high-risk, and the U.S. National Institute of Standards and Technology is drafting specific guidelines for AI agent security testing. Companies that treat AI agents as just another software tool rather than a novel security threat will be the ones making headlines for all the wrong reasons.

The bottom line: hiring an AI agent is not like hiring a human. You cannot train it, trust it, or fire it. You must continuously monitor, constrain, and audit it—and even then, expect surprises.

Frequently Asked Questions

The main risks include prompt injection attacks where malicious inputs override system prompts, data exfiltration, hallucinated actions leading to financial loss, and the autonomy paradox where increased capability leads to greater potential damage from a single compromise.

No, because they are probabilistic systems. Even with careful system prompts and training (RLHF), adversarial inputs can manipulate agents into acting against intended ethics. Continuous monitoring and guardrails are required, but absolute trust is not possible.

A prompt injection attack involves inserting malicious instructions into the AI agent's input context—for example, through a hidden line in an email or website. The agent may follow those instructions, ignoring its original system prompt, leading to unauthorized actions.

Enterprise spending on AI agent platforms reached $12.6 billion in 2025, according to Gartner, and is projected to triple by 2028, reflecting rapid adoption despite security concerns.

The EU AI Act classifies autonomous AI agents as high-risk, requiring security audits and continuous monitoring. The U.S. NIST is drafting specific guidelines for AI agent security testing. No global standard exists yet.

The autonomy paradox describes how granting AI agents more autonomy and capability increases the risk of severe damage if the agent is compromised. Unlike traditional software vulnerabilities, a compromised agent can reason and chain actions in ways developers may not anticipate.

Original source

www.forbes.com

Read original

Discussion

Join the discussion

No comments yet. Be the first to share your thoughts!