Most of the AI security conversation right now centers on, prompt injection, jailbreaks, and guardrails. That work matters, but it skips a more fundamental question: when your AI-enabled application is actually doing, do you know what it is doing when it executes? Not what your code says it was supposed to do. What it is actually doing. AI security begins with application observability.
Observability Starts With the Truth
Observability is trendy, but what does it really mean? In software, the design document is the plan but it’s probably only a small fraction of the code in the application stack. For instance, Sonatype notes,
“Commercial state-of-the-art software is built from as much as 90% open source code, including hundreds of discrete libraries in a single application.” — Sonatype, 10th Annual State of the Software Supply Chain Report (2024), p. 50
The report implicitly implies the converse, as 10% of application code is software written by application developers – ya, 10%. The point, vulnerabilities can live anywhere including your software supply chain, are you reviewing your supply chain code? Every time an open source library developer pushes an update do you review the update? What about the transitive dependencies? That’s the code your 3rd parties include and their code and so on, do you even know what these dependencies are? Sure you have an SBOM but do you it’s accurate and if so do you know if a vulnerable library was loaded or not? I doubt it. In fact, if your using AI code generation tools it’s likely you don’t even know what 10% looks like. If your security staff and static analysis tooling are not keeping pace with code velocity of AI code generation tools, you’re not alone. I’m not trying to make you feel bad about your software development practices but I’m pointing out some gaps. We are all using a lot of code we never wrote, some written by AI, many open source libraries, and commercial software where where there is no code for you to review (but your responsible for security quality to your customers). While your developers are applying security diligence to the sliver of code they write, it’s likely there’s gaps (more like a casims I’m afraid). Most of the security bugs like live in the casims. It’s deep, dark, warm, and comfy.
You can close the gaps with logging yourself and many do and it helps. Consider, a developer logs log.info("user logged in: " + user), an operator greps (or Splunks) the log, you get some useful info and life goes on. That model works when teams and applications were small, single-process, and human-scale. It doesn’t scale well to distributed systems, where developers of various skill levels cycle in and out of projects, and it definitely doesn’t scale to AI-assisted applications where behavior is partly driven by a probabilistic model reading untrusted text. To defend a modern application you need a ground-truth stream of what the application is doing, in real time, at a resolution an automated system can reason about. That stream is the aspirational goal of observability.
Why Classic Logging Fails as Ground Truth
Most applications still emit logs that look like this,
2026-04-17 09:14:32 INFO User admin logged in from 192.168.1.50
2026-04-17 09:14:33 WARN Failed login attempt for user jsmith
2026-04-17 09:14:35 ERROR SQL query failed: SELECT * FROM users WHERE id=1 OR 1=1
2026-04-17 09:15:01 INFO File /etc/passwd was accessed by process 4421
Human-readable, yes. Machine-usable as a security signal? Not really. Pulling the IP out of logged in from 192.168.1.50 needs a regex that breaks the first time someone rewords the message. Connecting the failed login to the injection attempt to the file read is a human job. And the hardest failure mode is the silent one, the context you needed wasn’t logged at all.
Structured logging fixes the shape of the data. Every event becomes a set of typed, named fields, usually JSON, rather than a string,
{
"timestamp": "2026-04-17T09:14:32Z",
"event_type": "auth.session",
"auth_action": "login",
"auth_success": true,
"principal_name": "admin",
"client_ip": "192.168.1.50",
"trace_id": "abc-123-def"
}
Now every field is individually queryable. A SIEM can index client_ip, alert on auth_success: false rates, and join events on trace_id, no free-form text parsing, no regex pipeline rotting every time someone changes a log message.
Super-Charging the Logging Stack You Already Own
Most organizations already run a centralized logging platform like Splunk, Elastic/ELK, Datadog, or Grafana Loki. These platforms are good at what they do. What they can’t do is invent application log metadata that never existed. The ceiling on detection quality is set at the source, the application. If the application emits user admin logged in, Splunk can only do so much. If the application emits a typed event with principal_name, client_ip, auth_success, and trace_id, the same Splunk dashboard becomes dramatically more powerful, and the ML and correlation features the vendor has been shipping for years finally have something to work with. Structured events are a force multiplier for your current logging infrastructure.
For Java, Ground Truth Starts With JVMXRay
The obvious objection: our developers would have to restructure every log statement. Even if you do that, you’d only be addressing the 10%, your code, remember? For greenfield code where you hardly use any open source, maybe it’s a tenable task. For the commercial, third-party, and legacy Java that runs most enterprises, that isn’t going to happen. At best, you can only improve logging in the 10% of code (your app). This is the problem JVMXRay POC, an open-source Java security monitoring agent I’ve been building, is meant to solve. It was presented at Black Hat Arsenal USA 2020. JVMXRay attaches to the JVM at launch as a -javaagent, uses bytecode injection to wire 19 sensors into a running application, and emits richly structured security events, with zero code changes to the target application. JVMXRay doesn’t care about source code. Your application probably won’t even know it is being monitored by JVMXRay.
java -javaagent:jvmxray-agent.jar -jar yourapp.jar
Your actual launch line will be a little more complicated to fit your app server, but the shape is that simple. Every event JVMXRay emits shares a universal field set, timestamp, trace_id, caller, thread_id, scope_chain, plus agent and configuration identifiers. On top of that base each sensor adds domain-specific fields. A tour of the more interesting ones follows.
HTTP, SQL, File I/O, Network, and Crypto
HTTP requests carry request_method, uri, client_ip, user-agent, request_size_bytes, plus inline threat indicators like path_traversal_attempt, sql_injection_pattern, xss_pattern, and a rolled-up risk_indicators_count. Responses capture status, response_time_ms, and security-header compliance: csp_present, hsts_present, frame_options, security_headers_missing_count. A single event can answer, did this request contain an injection attempt, and did the response ship without proper headers?
SQL events include sql_text, sql_operation_type (SELECT/INSERT/UPDATE/DELETE/DDL), is_parameterized, parameter_count, db_url, db_user, and a sql_hash. Completion events add duration_ms, status, sql_state, and error_code. Detecting non-parameterized queries or flagging anomalously slow queries collapses to a single SIEM filter.
File I/O events carry operation, resolved paths (original_path, absolute_path, canonical_path), world_writable, is_symlink, and is_sensitive, enough to detect symlink attacks or unusual writes to sensitive paths. Socket events classify connections (is_loopback, is_private_ip, connection_direction) and, for TLS, capture ssl_protocol, ssl_protocol_deprecated, ssl_cipher_suite, and ssl_cipher_weak. Crypto events go further, pre-computing compliance assessments: fips_140_compliant, pci_dss_compliant, nist_status, and a suggested_replacement. A deserialization event telling you a known gadget chain class just surfaced on the wire is immediately actionable without a human interpreting a log line.
Correlation: the Trace ID Story
A single structured event is useful. A chain of them is where the story lives. JVMXRay events carry a trace_id plus a scope_chain, for example HTTP>SQL>FileIO, showing the nesting path. An inbound HTTP request with a SQL injection payload, the non-parameterized query it triggered, the /etc/passwd read that followed, and the outbound network connection it opened all share the same trace_id. In Splunk or ELK, one query on that ID reconstructs the whole attack chain. That is observability doing its job, turning a diffuse, unknowable incident into a single, legible event.
Why AI Cares About Structure
AI-driven detection, whether it’s a classifier, an anomaly model, or an LLM agent acting as an analyst, wants consistent, typed features. That is literally what structured events are. A model trained on JVMXRay events can learn baseline distributions of sql_operation_type, typical scope_depth ranges, and expected connection_direction patterns, then flag deviations without a hand-built feature pipeline. Feed the same model unstructured text and you need an NLP layer just to extract features before detection can start, adding latency and an error surface.
The story generalizes past JVMXRay. Whatever your source of truth is, eBPF on Linux, OpenTelemetry for distributed traces, or cloud audit logs, the same rule applies. Structured, correlated events are what downstream AI can actually reason about. Unstructured text is where signal goes to die.
The Dual-Use Bonus
One advantage worth calling out: good observability has value beyond the security team. For example, JVMXRay emits monitor time series events at configurable intervals with JVM health data. Following is an example,
C:AP | 2024.09.15 at 14:30:25 EDT | jvmxray.monitor-1 | INFO | org.jvmxray.events.monitor |
MemoryTotal=512MB|MemoryFree=256MB|MemoryMax=1GB|ThreadNew=0|ThreadRunnable=15|
ThreadBlocked=0|ThreadWaiting=8|ThreadTerminated=0|OpenFiles=42|ProcessCpuLoad=12.5%|
GCCount=25|GCTime=150ms|NonHeapUsed=64MB|DeadlockedThreads=0|
LogBufferUtilization=5%|LogQueueSize=50|LogDiscardCount=0|
mcc_contexts_created=1250|mcc_active_contexts=3|mcc_ttl_cleanups=0|
lib_static_loaded=45|lib_dynamic_loaded=2|lib_total_packages=128|lib_cache_size=47
The same stream a SOC uses to spot a DDoS-shaped memory spike is the one an SRE team uses to spot a memory leak. Memory utilization over time is valuable to security but also software development and engineering teams. Dual-use security tooling that also makes the app team’s life easier has a better adoption and also helps to build security team credibility. A security team perceived as collaborative business partner and problem solver, instead of a firewall to progress, is better for the team but also the organization.
Where This Leaves AI Security
Input filtering, guardrails, and prompt injection defenses are all necessary, and yet none of them are entirely sufficient. It’s likely that defenses surviving long term will be layered on top of something that watches what the application actually does at runtime, because the only reliable place to catch a bypassed prompt injection defense is downstream, in the behavior it produces.
A good AI security story begins with observability. Observability begins with the truth about what the application is doing. For Java, the fastest path to that truth for legacy applications is observability, whether it’s JVMXRay or tools like it, emitting structured events that super-charge the centralized logging stack you already own gets you up and running fast. Everything else, detection, automation, AI analysis, enforcement, is built on top of that foundation. You can’t use AI to secure a problem you can’t see.
References
- State of the Software Supply Chain, Sonatype, 2024
- Bypassing Prompt Injection and Jailbreak Detection in LLM Guardrails, arXiv, 2025
- OWASP Top 10 for LLM Applications, LLM01: Prompt Injection
- Elastic Common Schema (ECS), Elastic Documentation
- OpenTelemetry Logs, OpenTelemetry Documentation
- JVMXRay, Java Security Monitoring, GitHub