Double Agents - Coding Agents Going Awry

When your tools become compromised

Apr 24, 2025

Also, you can find my book on building an application security program on Amazon or Manning

Many moons ago I actually wrote code for production environments in the healthcare space. While some of the muscle memory may have been laying dormant for the past several years, I still know what writing code should look like. Lately I’ve been immersing myself back into code development and getting reacquainted with old pains associated with it. However, much has changed. And when I say much, I mean everything.

While “auto-complete” functionality has existed in IDE’s (integrated development environment) for quite some time, tools like GitHub CoPilot and Cursor take it lightyears further with the integration of AI-powered assistants. These assistants enable developers to be more efficient in writing code, but also bring with it fresh opportunities for attackers to take advantage of. So, what are these code agents and how are they used?

What are code agents

The short of it is that code agents allow for developers to focus more on the creative aspects of software as opposed to the tedium of repetitive and mundane coding. And it is certainly catching on with developers. According to GitHub, a staggering 97% of developers who try AI coding assistants continue using them. From solo developers to Fortune 500 engineering teams, code agents have become an integral part of software development transforming workflows and productivity.

Code agents integrate with an IDE such as VSCode, JetBrains, Eclipse, and Cursor. While CoPilot and Cursor are two examples of coding agents, there are others like Tabnine, Amazon Q, and Codeium. These coding support tools offer:

AI Models to understand the inputs from the developer and generate output code.
Context awareness to scan and understand the current file, project, and comments to consider developer intent.
Agent predictions to provide code completion as you type as well as the creation of boilerplate or repetitive code.
Support for multiple languages and collaboration across teams.

The code agent supported IDE allows for developers to write code faster, reduce errors created by hand written code (I can attest), and offer the ability for developers to learn unfamiliar languages and code. These assistants vary depending on the specific one, but they essentially follow a typical pattern:

Capturing the Partial Input: When a developer types a partial line of code, the code agent immediately detects the input within the IDE. For example:

def calculate_sum(

The agent recognizes this as the beginning of a function definition in Python.

Context Gathering: The agent then gathers the context from multiple sources like the active file, project files, and code comments to understand the developer's intent. The agent also uses it’s knowledge of the programming language to prepare a response that follows the languages syntax and conventions.

Building the Prompt: The agent constructs a prompt for its underlying AI model such as the following:

The partial input (def calculate_sum(.
Relevant context from the file or project (e.g., other mathematical functions, imported libraries, etc).
Any explicit instructions from comments or rules files (more on rules files in a bit)

The agent may then build a prompt to send to the model that looks like this:

Complete the following Python function definition: def calculate_sum(numbers): # This function should calculate the sum of a list of integers.

Model Processing: The AI model processes the prompt and generates a suggestion based on the code that it has been trained on and can infer common patterns for function definitions. In other words, it’s seen this type of function before and can build a response based on its knowledge. However, it also takes into consideration the agent provided context in order to tailor the suggestions for the given request and produce a potential suggestion:

def calculate_sum(numbers): return sum(numbers)

Model Suggestion: The agent presents the generated suggestion inline within the IDE, allowing the developer to choose a few options (mileage will vary based on the IDE, but it may look like this):

Accept: Automatically insert the suggestion into the code.
Modify: Edit the suggestion to better fit their needs.
Reject: Ignore the suggestion and continue typing manually.

Incorporating Developer Feedback: If the developer accepts or modifies the suggestion, the agent learns from this interaction to refine future recommendations. For example:

If the developer frequently uses numpy.sum() instead of sum(), the agent may prioritize numpy in future suggestions.

Obviously on a smaller scale, this can be considered glorified auto complete. But the real power of the agents comes with being able to perform more advanced tasks such as creating entire classes from a few lines of prompt, interpreting code that the developer is inquiring about, or even creating an entire code repository from scratch. These examples are massive advances and time savers for software developers. But, brining it back to the developer experience, the real-time assistance that agents provide can keep developers focused on their workflow creation while meeting the context and customizations required by the particular task.

That context and customization can be controlled by rules files written in JSON, XML, or YAML.

Example:
"namingConventions": { "variables": "camelCase", "classes": "PascalCase", "functions": "snake_case"}

Think of rules files as the organization’s guiding principles on coding context, standards, and frameworks that the suggestions made by the agent must follow. It’s the boundaries and guardrails around the agent. And yes, of course, it can be compromised.

So, what’s wrong with a little help?

In general, code agents are a boon for productivity when it comes to developing software. However, we are learning how attackers can influence AI to produce results that can turn into vulnerable software. Rules files are just one example. The rules files that govern the responses to prompts has been recently found to be vulnerable to influence from an attacker. In this case the attack (being termed "Rules File Backdoor") exploits how code agents process their rules files to inject malicious instructions that remain virtually invisible to human reviewers.

Rules File Backdoor involves three primary components:

Unicode Manipulation: Attackers embed invisible control characters and directional overrides within configuration files. While these characters don't render visibly in most text editors (bypassing human inspection) they alter how the agent processes instructions.
Context Poisoning: By crafting natural language instructions that appear harmless to humans but contain these linguistic tricks, attackers can inject commands that the model will run.
Model Exploitation: The attack leverages how large language models process information through prediction rather than executing logic. This difference makes these systems vulnerable to adversarial inputs that wouldn't affect traditional software.

What makes this attack challenging is its ability to go undetected. When you review a poisoned rules file, everything looks completely normal to the extent that even a code review is not likely to uncover the malicious instructions considering that humans simply don't see it when reading code. But the agent does.

When trust is a vulnerability

In the Application Security practice we often assume that configuration files are free from external influence. We tend to spend careful attention to changes to source code but often wave through updates to configs and rule sets with minimal review considering they're just telling our tools how to behave, not perform actions.

Rule Files shatter that assumption.

These files are instruction sets that directly influence what code gets written. That innocent looking rules file might be silently instructing the agent to add authentication bypasses, insert data exfiltration mechanisms, or create deliberate logic flaws. All while looking completely legitimate to reviewers. Also, our existing controls are designed to look for logic and code flaws in lines of code and runtime behavior, not in how the agent is being instructed on how to generate code.

The persistence of this attack vector is particularly troubling. Once a poisoned rules file enters a repository, it silently propagates through forks, branches, and updates. Each developer who pulls the repo inherits the compromised configuration, and every piece of code subsequently generated carries the influence of those hidden instructions. In other words, this backdoor can remain undetected throughout countless code contributions.

You might be asking where these rules files come from, how do they get into the process and eventually effect an agent. Unlike traditional vulnerabilities that require specific exploits, rules file backdoors leverage our existing sharing infrastructure as their distribution mechanism. They are not written internally by the developers in the organization (although they could be through an internal bad actor), but are instead written by external actors and propagated through the existing trust built by developers that rely on 3^rd party code and features that exist in the wild. For instance, poisoned rules files can be found in some of these locations:

Community Forums: Developers frequently share "improved" configurations and rule sets on Stack Overflow, Reddit, and Discord servers. A single poisoned configuration posted in a popular thread could be downloaded by thousands.
Open-Source Templates: Starter repositories and templates are the foundation for countless new projects. A compromised template repository could spawn hundreds of vulnerable downstream projects before anyone notices.
Internal Knowledge Bases: Corporate wikis and team documentation often include recommended configurations that, once poisoned, quickly become the standard across entire organizations.

When we think about supply chain security our minds usually go to the hardware and software components that we use in our stack, but we don’t often think about the configuration of development tools as a point of weakness. And while insecure configuration is a commonly known security weakness, we again tend to think about that in the context of the hardware and software that we use, not our IDE.

It’s time to change that mindset.

One successfully poisoned repository could compromise dozens of dependent projects, which in turn affect their own dependencies. The progression of this attack vector makes traditional vulnerability management approaches nearly useless. By the time you've identified the source, the infection has already spread throughout the ecosystem.

Where does that leave us?

As code agents become more deeply integrated into development workflows, security and engineering teams need to look beyond the usual suspects when it comes to security weaknesses. There are a few practical strategies to begin taking:

Extend secure code reviews to include configuration files: Rule files for AI assistants should now be treated with the same scrutiny as executable code. Implement strict change control processes that include:

Mandatory peer reviews specifically focused on AI configuration files and especially targeting ones that are newly introduced from an external entity.
Version control and approval workflows for all rule files, even those considered "internal" or "development-only"
Digitally signed configuration files to establish a chain of trust and prevent unauthorized modifications

Enhance your detection capabilities: Traditional code scanning tools weren't designed to catch these sophisticated AI manipulation attacks. Security teams should:

Implement Unicode analysis tools that flag invisible characters, bidirectional text controls, and other obfuscation techniques
Create validation scripts that normalize text and check for discrepancies between what's displayed and what's actually processed
Build formatting standards that strip potentially dangerous characters before configuration files are accepted into repositories
Audit agent prompts if you have the ability to see what the agent is actually prompting the model when making a request

Provenance and Tracking: Know where your agent rules come from, how it was created, and who modified it:

Establish trusted sources for AI configuration templates and rules
Create a configuration registry that tracks the lineage and change history of all rule files
Implement automated checks that validate rule files against known-good templates before deployment
Once a rule file has been approved for use move it to an internal and trusted repository within the organization where rule files are pulled from

Runtime Monitoring: Monitor what your AI assistants are actually producing:

Implement AI-generated code scanning that looks specifically for patterns associated with known attack techniques
Create guardrails that flag unusual AI behaviors, such as generating code with obfuscated functions or unusual network calls
Perform regular audits of AI suggestions and auto-completions to identify potential manipulations

Education and practices: Just like any other aspect of cybersecurity, staying ahead of the changes and implementation of technology goes a long way:

Redefine the attack surface to include AI tools, configurations, and the interactions between human developers and AI assistants
Build AI-aware security practices into every phase of the SDLC, from design to deployment
Invest in research and education to stay ahead of emerging AI security threats

When implementing these controls, remember that threat actors only need to succeed once. A single compromised rules file that escapes detection can potentially impact thousands of code commits across multiple projects. So, as we often shout from the rooftops (or maybe that’s just me) we have to utilize a defense in depth strategy that relies on multiple overlapping controls to provide overall security.

Time to stop using agents?

Of course not.

While agent usage will continue to grow, we are likely to see AI take bigger steps in the development of technology over the coming years. The scenario that Rules File Backdoor highlights is just a small example of what is to come as we wrangle to get security controls around this technology. We have to treat AI as we would any other privileged developer in our environments. We've, in essence, invited a new actor into our development process. One that thinks differently than we do, processes information differently than we do, and can be manipulated in ways we're still discovering.

The good news? We're security professionals. Adapting to new threats is what we do. By treating AI as both a tool and a potential attack vector, we can build the next generation of secure systems that allow for human creativity that leverages AI capabilities. Building that partnership starts with understanding the threats and implementing defensive strategies that keep our systems secure in the AI powered future.