AI Security, A Different Approach

There are a number of approaches to application security proven useful over the years: web application scanning, static analysis, and methodologies that embed security into the software development lifecycle like “shift left.” Did any of these prepare us for the AI revolution? Early on, security practitioners noticed many of the old attack techniques we’d built defenses against were being used effectively against AI systems. How could this be the case? AI companies didn’t know as much as they thought they knew about security? Old school became, new school for AI researchers? How else could it be that 20yr old attack techniques were bypassing AI security controls? To their credit, AI companies learn fast, most are adding salty security experts to their ranks.

Old Tricks, New Targets

Remember Leet Speak or where it came from? Probably not, but that’s ok, since it was created before most of my readers were born. The character substitution game from 1990s BBS culture, replacing letters with numbers and symbols (h4ck3r, p@ssw0rd)? It turns out this decades-old technique is devastatingly effective against modern AI safety guardrails.

A 2025 study testing six major guardrail systems, including Azure Prompt Shield, Meta Prompt Guard, and Nvidia NeMo Guard, found that simple leet speak substitutions achieved an 81% bypass rate on injection detection and a 95% bypass rate on jailbreak detection. Unicode tricks pushed those numbers even higher. Emoji smuggling, where text is embedded in emoji variation selectors, achieved a 100% bypass rate across several commercial guardrails.

Why does Leet work? LLMs were trained on the messy reality of internet text, full of leet speak, Cyrillic lookalikes, and creative misspellings. The model reads right through the obfuscation. But the safety classifier sitting in front of the LLM often doesn’t. It’s a gap between what the guardrail sees and what the model understands, and attackers are walked right on through defenses.

Prompt Injection: The Dominant Threat

Prompt injection now sits at #1 on the OWASP Top 10 for LLM Applications. It’s the most persistent, highest-severity vulnerability in production AI systems, and it’s not going away anytime soon.

The prompt injection attacks fall into two categories. Direct injection is where users craft prompts to override system instructions, through role-playing scenarios, encoding tricks, or format manipulation like HiddenLayer’s Policy Puppetry technique, a universal jailbreak that worked on all major frontier models. Indirect injection is more insidious, malicious instructions hidden in documents, websites, or code that the AI processes without the user ever seeing the attack.

The real-world consequences are already here. In 2025, researchers demonstrated that attackers could embed malicious instructions in public GitHub repositories, hidden in HTML comments, docstrings, or whitespace, causing Copilot to exfiltrate secrets from developers’ private repos. GitHub’s fix? They disabled image rendering in Copilot Chat entirely. Not a surgical fix, a feature removal. Targeted defenses weren’t enough?

As for defenses, the assessment is sobering. A joint study involving researchers across OpenAI, Anthropic, and Google DeepMind tested 12 published defenses using adaptive attack methods. Defenses that originally reported near-zero attack success rates were all bypassed at 90%+ rates under adaptive conditions. Every published defense has been broken. Prompt injection isn’t a bug, it’s an architectural consequence of how LLMs mix instructions and data in the same channel.

A Different Approach: Runtime Security Visibility

This is where I think the conversation needs to shift. We’re spending enormous energy trying to prevent bad inputs from reaching models, and that work matters. But what if we also watched what applications actually do once instructions are processed?

That’s the idea behind JVMXRay POC, an open-source security monitoring tool I’ve been building for Java applications. It was presented at Black Hat Arsenal USA 2020, and its core premise is simple: instrument the JVM itself to make application behavior visible in real time. JVMXRay uses bytecode injection to attach 19 sensors to a running Java application covering domains like,

File I/O
Network connections
SQL queries
Cryptographic operations
Authentication events
Process execution
Deserialization/Reflection and more…

JVMXRay requires zero code changes to the target application being monitored. In fact, it’s likely your application won’t know it’s being monitored. You attach it at your applications launch time like the following:

java -javaagent:jvmxray-agent.jar -jar yourapp.jar

Your actual command line will likely be a little more complicated due to your app server requirements but it’s otherwise accurate, moving on. Think of JVMXRay as a source of truth for your application. Not what your application is supposed to do but what it’s actually doing. What files it’s reading, what network connections it’s making, what system properties it’s accessing, what processes it’s spawning. Every operation, correlated into traceable event chains.

Why This Matters for AI Security

Consider the current landscape: developer tools powered by AI are scaling rapidly, but security isn’t keeping pace. 83% of organizations plan to deploy agentic AI, but only 29% feel ready to secure it. The attack surface is expanding faster than defenses can cover it. Aside from studies it’s intuitive, AI coding tools have 10x-20x developer code velocity but security staff and tooling have not kept pace.

This is where runtime monitoring becomes a force multiplier. If an AI agent in your Java application is compromised through prompt injection and starts reading files it shouldn’t, opening unexpected network connections, or executing system commands, JVMXRay sees it. Not because it understood the prompt injection. Because it’s watching the application behavior at runtime.

JVMXRay’s Mapped Correlation Context (MCC) system traces events across the full execution path. An HTTP request that triggers a SQL query that leads to a file read shows up as a single correlated chain, HTTP → SQL → FileIO , with a shared trace ID. You can reconstruct exactly what happened, in what order, and why. That’s the kind of visibility that turns an invisible attack into a visible one.

Frontier Security: No Code Required

There’s a practical argument here that I think gets overlooked. Most security tools require access to source code, for static analysis, for instrumentation, for integration. But in enterprise environments, you’re often running commercial Java applications, third-party libraries, and vendor software where source code isn’t available.

JVMXRay doesn’t need source code. It operates at the JVM bytecode level, which means it works with any Java application, in-house, commercial, or open source. No code changes, no recompilation, no vendor cooperation required. Attach the agent, and you have visibility. That makes deployment fast and removes the barriers that slow down security adoption in exactly the environments where it’s needed most.

Events are structured and machine-readable, designed to feed into centralized logging platforms like Splunk, ELK, or Datadog, and they’re built to be consumed by AI-powered analysis tools. The structured format means you can layer your current automated threat detection and log tools on top of the raw behavioral data.

Beyond Security

It’s worth mentioning that monitoring application events has benefits far beyond security. While JVMXRay is developed with security in mind there’s definitely opportunity for stretch fit and overlap into other domains like, diagnostics, usage tracking, and perhaps even auditing. The true state of what is occuring actively in the application is important to all these domains. Many of the current sensors already provide some dual use; for instance, the resource monitor. Understanding how resources are allocated and released over time is primarily a performance optimization feature yet it’s also a security concern. Spikes in resource utilization can indicate a feature bug but it can also signal a DDOS attack. Following is an example of a monitor event fired at regular user configured intervals.

C:AP | 2024.09.15 at 14:30:25 EDT | jvmxray.monitor-1 | INFO | org.jvmxray.events.monitor |
MemoryTotal=512MB|MemoryFree=256MB|MemoryMax=1GB|ThreadNew=0|ThreadRunnable=15|
ThreadBlocked=0|ThreadWaiting=8|ThreadTerminated=0|OpenFiles=42|ProcessCpuLoad=12.5%|
GCCount=25|GCTime=150ms|NonHeapUsed=64MB|DeadlockedThreads=0|
LogBufferUtilization=5%|LogQueueSize=50|LogDiscardCount=0|
mcc_contexts_created=1250|mcc_active_contexts=3|mcc_ttl_cleanups=0|
lib_static_loaded=45|lib_dynamic_loaded=2|lib_total_packages=128|lib_cache_size=47

Resource utilization over time information is great starting point of truth for your SOC. With such highly structured information, it’s relatively easy work to graph application resources over time. There’s also other sensors like the uncaught exception handler. When the handler catches an unhandled exception it produces and event and sends a message to the logging services that looks like.

C:AP | 2024.09.15 at 14:30:25 EDT | pool-1-thread-3 | INFO | org.jvmxray.events.system.uncaughtexception |
thread_name=pool-1-thread-3|thread_id=42|thread_state=RUNNABLE|thread_priority=5|
thread_daemon=false|thread_group=main|exception_type=java.lang.NullPointerException|
exception_message=Cannot invoke method on null reference|
exception_location=com.example.service.OrderService:127|exception_method=processOrder|
stack_depth=15|incident_id=c3a1b2d4-e5f6-7890-abcd-ef1234567890|
root_cause_type=java.lang.NullPointerException|root_cause_message=Cannot invoke method on null reference|
stack_trace=com.example.service.OrderService.processOrder(OrderService.java:127) > com.example.controller.OrderController.submit(OrderController.java:45)

There’s other more powerful tools in these categories, we all have our favorites, including myself. My goal with JVMXRay is not to compete with heavy hitter commercial tools. But instead provide a security first tool that sets a value baseline for us all.

Beyond the value add around JVMXRay technical capabilities there’s also a soft value, high quality diagnostic information reinforces cooperation and business relationships between security and application developers. Security is perceived less as an obstacle to progress and more of a business partner that can help identify feature problems before engineering receives weekend and late night phone calls.

What’s Missing: Enforcement

I want to be honest about where things stand. Today, JVMXRay brings visibility and that’s an essential first step. You can’t defend what you can’t see. But visibility without enforcement is monitoring, not protection.

There’s work is ahead of us. But the foundation begins with visibility. You have to understand normal before you can define and enforce boundaries. And in a world where prompt injection defenses are being bypassed at 90%+ rates, having an independent layer that watches actual behavior, regardless of how the application was tricked into that behavior, is not optional. It’s essential.

The AI security problem won’t be solved by any single tool or technique. But I believe runtime behavioral monitoring deserves a seat at the table alongside input filtering, output validation, and all the other approaches we’re developing. Defense in depth has always been the right strategy. JVMXRay is one more useful layer, and watching what matters most: what actually happens on applications.

References

Bypassing Prompt Injection and Jailbreak Detection in LLM Guardrails, arXiv, 2025
OWASP Top 10 for LLM Applications, LLM01: Prompt Injection
Novel Universal Bypass for All Major LLMs (Policy Puppetry), HiddenLayer
Prompt Injection Attacks on Agentic Coding Assistants, arXiv, 2026
Prompt Injection Attacks in 2025, PremAI
JVMXRay, Java Security Monitoring, GitHub