What Happens When You Point Offline AI at Critical Infrastructure Software

Proactive vulnerability discovery for the software that critical infrastructure depends on.

Why This Matters Now

In December 2024, Salt Typhoon compromised at least nine U.S. telecommunications providers by exploiting vulnerabilities in network infrastructure that had been sitting unpatched for months. The attackers gained access to call records, live wiretap systems, and law enforcement surveillance data. The vulnerabilities were known. They simply had not been found and fixed fast enough.

This is the pattern that keeps repeating. Volt Typhoon pre-positions in critical infrastructure such as water utilities, energy grids, and transportation systems. Healthcare organisations face ransomware campaigns that exploit VPN appliance flaws disclosed months earlier. Financial institutions discover that the endpoint security agents protecting their networks contain the very vulnerabilities attackers need to establish a foothold.

The common thread is not a lack of security tools. It is a throughput problem. There are more software targets than security researchers. More code ships every day than can be audited manually. Adversaries, including state-sponsored groups, ransomware operators, and initial access brokers, are already automating their vulnerability discovery at scale.

The question is not whether AI will be used to find software flaws. It is whether defenders or attackers get there first.

Our Approach

We have built an automated vulnerability discovery pipeline that uses offline AI models combined with multi-method analysis to find, validate, and responsibly disclose vulnerabilities in the software that critical infrastructure depends on, before adversaries can weaponise them.

The key word is offline. Target firmware, proprietary binaries, and decompiled source code never leave our analysis environment. There are no cloud APIs and no third-party data processing. This is not simply a technical preference. When analysing VPN appliances deployed across banking networks, endpoint agents protecting hospital systems, or firmware running in industrial control environments, keeping analysis air-gapped is both a legal and an ethical requirement.

A note on transparency: the AI does not discover vulnerabilities through reasoning alone. The security toolchain does the heavy lifting. Fuzzers crash programs, decompilers reveal unsafe code paths, and pattern engines flag dangerous constructs. The AI accelerates the researcher. It does not replace the toolchain, and it does not replace the researcher’s judgement. Where the AI contributes is detailed later in this post.

How It Works

The pipeline orchestrates six stages, each blending AI intelligence with deterministic security analysis.

Intelligence Ingestion. Continuous monitoring of vulnerability intelligence feeds, advisories, and patch disclosures. The AI prioritises which newly disclosed vulnerabilities are likely to have undiscovered variants in related software. When a buffer overflow is found in one DNS implementation, every other DNS implementation needs checking.

Pattern Extraction. For each prioritised vulnerability, the pipeline extracts the underlying flaw pattern from patch diffs and advisories. The AI translates raw code changes into generalised detection signatures, understanding not just what was fixed but what class of mistake produced the vulnerability.

Variant Hunting. Cross-implementation scanning across a registry of targets spanning network infrastructure, endpoint security agents, VPN clients, firmware images, and protocol implementations. The same vulnerability pattern that appeared in one vendor’s code often exists in another’s: a different codebase, but the same mistake.

Deep Analysis. Multi-method analysis tailored to each target type. Source code is fuzz-tested with AI-generated harnesses. Compiled binaries are decompiled and pattern-matched. Firmware images are extracted, emulated, and probed. Windows executables are disassembled and cross-referenced for dangerous API patterns. No single method covers everything; the pipeline selects the right approach for each target automatically.

Automated Validation. Every finding receives dynamic confirmation before it is considered real. Static analysis produces noise. Our validator framework connects to actual running software, replays inputs, and proves each issue end-to-end. This is the difference between a scanner report and a credible disclosure.

Responsible Disclosure. Automated CVSS scoring, disclosure document preparation, and vendor notification drafting, all structured for coordinated disclosure from day one.

What Makes This Different

Offline by design. The AI models run on local hardware. Analysis environments are air-gapped. This enables responsible research on software that cannot be uploaded to cloud services: the same VPN appliances, endpoint agents, and network infrastructure that Salt Typhoon and Volt Typhoon target.

Multi-method, not single-trick. Most AI security tools do one thing. We route each target to the appropriate analysis pipeline: fuzz testing for C/C++ libraries, bytecode decompilation for managed-code agents, binary analysis for Windows executables, and firmware extraction and emulation for embedded devices. The vulnerabilities we have found span all of these categories. No single method would have caught them all.

Validation-first. Static findings are cheap. Anyone can run a scanner and get hundreds of hits. The hard part is knowing which ones are real. Our framework dynamically confirms each finding against actual running software, not test harnesses or simulations, but the real vendor binaries.

Continuous learning. The pipeline tracks analysis quality over time. When a fuzz harness produces false positives, that is, crashes that do not reproduce against the real binary, the system penalises that approach and adjusts future analysis accordingly. This was born from real experience: early findings that appeared critical turned out to be analysis environment mismatches rather than real bugs. That failure mode is now built into the pipeline’s memory.

Early Results

In under 48 hours of focused execution, the pipeline identified 22 dynamically confirmed potential CVEs across six targets spanning endpoint security, network infrastructure, enterprise VPN, and IoT firmware, with additional findings still pending validation.

Each confirmed finding was verified against the real running software, not simulated environments. Multiple analysis methods were required; no single technique would have uncovered the full set.

Coordinated disclosure is in progress. Specific details are intentionally withheld.

Vulnerability Research at Scale

Those 48 hours were not a one-off sprint. The pipeline runs continuously, around the clock, moving from target to target and cycling through its registry of software that critical infrastructure depends on. While one campaign fuzzes a network service across nine parallel cores, another is decompiling a freshly downloaded endpoint agent, and a third is validating the previous week’s findings against a new firmware release.

This is what changes about vulnerability research when it is automated. A human researcher picks a target, spends days on it, writes up findings, and moves on. The pipeline does not context-switch. It runs every target in the registry on a rolling basis, re-analysing after vendor updates, re-scanning when new vulnerability patterns emerge, and re-validating when new software versions ship. A vulnerability that did not exist in version 4.0 might appear in 4.1. A pattern that was benign in one library becomes exploitable when a dependency changes. Continuous analysis catches what periodic audits miss.

The 22 findings in 48 hours are a snapshot. The pipeline does not stop when the blog post goes up. It is analysing the next target right now.

What the AI Does (and Does Not Do)

The AI accelerates three specific steps in the pipeline:

Pattern translation. Reading a vulnerability patch and articulating the flaw class in terms that generate detection rules. A researcher does this in 20 minutes. The AI does it in seconds.
Analysis scaffolding. Generating the setup code that initialises target software for testing. This involves reading thousands of lines of initialisation logic and producing correct test configurations, saving hours per target.
Triage prioritisation. Classifying findings and recommending where human attention should focus. The accuracy is roughly 80%, which is imperfect but sufficient to cut manual review workload significantly.

Every confirmed finding came from a concrete execution against real software. The AI accelerates the workflow. It does not replace the toolchain, and it does not replace the researcher’s judgement.

Responsible Disclosure

Every finding follows coordinated disclosure. This is non-negotiable.

Vendor notification first. Always. No exceptions.
90-day standard timeline. Vendors receive 90 days to develop and ship patches before any public detail.
No exploit code published. Proof-of-concept validations exist for confirmation only, shared privately with vendors.
Abstract reporting. This post intentionally omits vendor names, specific vulnerability details, and exploitation techniques.

The goal is to get vulnerabilities fixed, not to demonstrate capability, not to pressure vendors publicly, and not to enable harm. If a vendor patches silently and no CVE is ever assigned, that is a success. The software is safer. That is the only metric that matters.

What’s Next

The pipeline continues to expand. Windows dynamic validation is in progress, running vendor software in isolated sandbox environments and confirming that static findings are exploitable at runtime. New target categories are being onboarded, including cloud-native infrastructure, container orchestration platforms, and mobile security agents.

The longer-term vision is continuous, autonomous defence. The system monitors the vulnerability landscape daily, analyses new disclosures for variant potential, tests every target in the registry, validates findings, and queues disclosure drafts for human review. The researcher becomes the decision-maker, not the operator. The pipeline handles the scale.

The technology exists today. Adversaries are already using AI to automate offence. The only question is whether the defensive community adopts the same acceleration, and whether we get there before the next Salt Typhoon.

Work With Us

We are already partnering with organisations across critical infrastructure, financial services, healthcare, and telecommunications to integrate proactive vulnerability discovery into their security programmes. If your organisation would benefit from finding vulnerabilities in your software stack before an adversary does, we would welcome the conversation.

Reach out: Contact us or [email protected]

See the full research record → · Live pipeline dashboard →

Written by CyberDagger Engineering. All analysis is performed on legally obtained, publicly available software. Findings follow coordinated disclosure with standard 90-day timelines. No specific vulnerability details are disclosed in this post.

Previous: AI-Powered AD Analysis: What Changes … Next: eval() in a Root Process: Discovering …