Approach Cobalt Research Team Docs Demo Get In Touch

Protected Document

Enter the password to access this whitepaper.

Product Vision

The AI Decision Audit Platform: Model-Agnostic Governance for Regulated Industries

The Problem No One Has Solved

Enterprises in banking, insurance, and financial services are deploying AI at scale. Credit decisions, claims adjudication, fraud detection, and compliance monitoring are increasingly assisted or automated by large language models from Anthropic, OpenAI, Google, Meta, and others.

The technology works. The governance does not.

Today, when an AI system denies a loan application or flags a transaction, the organization cannot reliably answer a simple question: why did the model make that decision?

The model's own explanation is not the answer. LLMs generate plausible narratives about their reasoning, but these are post-hoc rationalizations. They describe what the model said, not what it did. There is no verifiable link between the explanation and the actual internal process that produced the decision.

This is not a theoretical concern. The EU AI Act classifies credit scoring and insurance pricing as high-risk AI, requiring documented transparency and human oversight. Federal Reserve SR 11-7 mandates model risk management for any model used in decision-making. The CFPB requires specific adverse action reason codes for credit denials. State insurance regulators require demonstrable fairness in claims handling.

The current state of the industry: powerful models making consequential decisions, with no reliable way to audit why.

Why Existing Approaches Fall Short

Post-hoc explanation tools

SHAP, LIME, and similar tools were designed for traditional ML models. They approximate feature importance by perturbing inputs and observing output changes. Applied to LLMs, they operate on tokens rather than business concepts. Knowing that the token "denied" was influenced by the tokens "income" and "$42,000" is not a useful explanation for a compliance officer.

Model provider interpretability

Anthropic, OpenAI, and others are investing heavily in understanding their own models. This research is valuable but it does not solve the enterprise problem. Provider-level interpretability is designed for AI safety research, not for generating SR 11-7 validation reports. It works only on that provider's models. And the most capable models are closed-source, meaning no external party can inspect their internals regardless.

Observability platforms

Tools like Datadog, Arize, and Fiddler monitor model performance: latency, accuracy, drift. They answer "is the model working?" but not "why did it make this specific decision?" They are valuable for operations but do not satisfy regulatory requirements for decision-level auditability.

Internal data science teams

Many organizations attempt to build interpretability in-house. This typically results in custom scripts, one-off analyses, and manual compliance reporting that does not scale, does not cover all models, and does not survive staff turnover.

The BluelightAI Approach

BluelightAI's platform, Cobalt, is an AI decision audit system. It sits between your AI models and your compliance requirements, capturing every decision, testing why it was made, and producing the regulatory documentation you need. Cobalt is the product name for the full platform: the capture SDK, the immutable audit store, the analytics dashboard, and the topological pattern engine.

Three design principles define Cobalt.

Model-agnostic by design. The platform works with any LLM provider: Claude, GPT, Gemini, Llama, Mistral, or proprietary internal models. Most enterprises use multiple AI providers across different use cases. An audit platform that only covers one provider covers a fraction of your decisions. BluelightAI covers all of them from a single pane.

Invisible to end users. The platform integrates into your existing infrastructure with minimal code changes. Your loan officers, claims adjusters, and analysts continue using their current systems. BluelightAI operates in the background, capturing and enriching every decision. End users interact with it only when they need to understand a specific decision.

Interpretability at every layer. Cobalt combines multiple interpretability methods depending on what the deployment allows. For closed-source API models, the platform uses structured input decomposition, targeted perturbation testing, cross-model validation, and topological pattern analysis. For open-weight models such as Qwen, BluelightAI goes further: our research team trains Cross-Layer Transcoders (CLTs) that decompose model computations into interpretable features, providing direct mechanistic evidence for how a decision was reached. Both layers produce explanations in business terms, and the platform does not break when models are updated or replaced. The result is a system that works across any model provider while extracting the deepest possible understanding from each.

Technical Architecture

The platform consists of four components.

1. The Capture SDK

A lightweight library that wraps your existing LLM API calls. Integration requires changing approximately five lines of code in your inference pipeline.

The SDK performs four functions in sequence:

Input decomposition. Before the model is called, the SDK parses the structured data being sent: loan application fields, claim details, customer records, transaction data. Each input feature is tagged and stored. For BFSI use cases, inputs are already structured, so this step is straightforward.

Transparent forwarding. The SDK forwards the request to whatever model the application is configured to use. The response is returned to the calling system at normal speed. No latency is added to the decision path.

Response capture. The model's raw response, the extracted decision, the stated confidence, and any self-explanation are logged with a timestamp and immutable record ID.

Asynchronous enrichment. After the response is returned, the SDK runs targeted perturbation tests in a background queue:

  • Feature sensitivity analysis. Systematically vary individual input features and observe which ones change the decision. "If we remove the applicant's ZIP code, does the outcome change?" If yes, geographic information is driving the decision.
  • Counterfactual generation. For denials, identify the minimal input changes that would produce an approval. "This application would be approved if the DTI ratio were 37% instead of 42%." This directly produces the adverse action reason codes required by ECOA.
  • Cross-model validation. Run the same decision through a secondary model. Where models agree, confidence is high. Where they disagree, the decision boundary is uncertain and may warrant human review.
  • Proxy discrimination testing. Remove or neutralize features that correlate with protected classes and observe whether the decision changes. This is a direct test for disparate impact, run automatically on every decision.

These enrichment tests are configurable per use case. Credit underwriting tests different features than claims adjudication. The test suite evolves with each deployment as domain-specific knowledge accumulates.

2. The Audit Store

An immutable, append-only record of every AI decision and its interpretability artifacts.

Each record contains:

  • The structured input features sent to the model
  • The model provider, model version, and prompt template used
  • The raw model response and extracted decision
  • The perturbation test results and counterfactual analyses
  • Timestamps, record ID, and cryptographic hash for tamper evidence

The audit store can be deployed in two configurations:

Customer-hosted. The store runs within the customer's cloud environment (AWS, Azure, or GCP). BluelightAI provides infrastructure-as-code templates. Data never leaves the customer's VPC. This is the expected configuration for Tier 1 financial institutions.

BluelightAI-hosted. A fully managed, SOC 2 compliant service for organizations comfortable with a managed deployment.

Records are encrypted at rest and in transit, access-controlled via the customer's identity provider, and retained for the full regulatory window. For banking, this is typically five to seven years. For insurance, retention periods vary by jurisdiction.

3. The Cobalt Dashboard

The user-facing layer of the Cobalt platform. A web application purpose-built for four distinct user personas in regulated financial institutions.

For frontline staff (claims adjusters, loan officers, underwriters): Cobalt provides an embeddable component, a link or panel, that integrates directly into the organization's existing workflow systems. When a customer asks why a claim was denied, the adjuster clicks "View Audit" within their current platform. A detail view opens showing the primary factors that drove the decision, the counterfactual analysis, and the full audit trail, all in plain business language.

Frontline staff do not learn a new tool. They access BluelightAI within the tool they already use, only when they need to.

For model risk managers: A dedicated dashboard showing model health across all AI systems in production. Drift detection, flagged decisions where perturbation tests revealed unexpected sensitivity, cross-model disagreement reports, and pattern alerts surfaced by the topological analysis engine.

For compliance officers: Pre-built regulatory report templates that auto-populate from the audit store. SR 11-7 model validation. EU AI Act Article 14 human oversight documentation. ECOA adverse action reason code distributions. NIST AI RMF mapping. State-specific insurance regulatory filings. A quarterly SR 11-7 report that currently requires weeks of manual work across multiple teams is reduced to an afternoon of review and approval.

When an examiner requests the decision trail for a specific loan application or claim from eighteen months ago, the compliance officer retrieves the complete record in seconds: inputs, model used, decision, perturbation results, and any human overrides.

For executives: A summary view showing total AI decisions by period and outcome, compliance posture across all frameworks, risk alerts and resolution status, and cost per decision by model provider. One page. Exportable for board reporting.

4. The Topological Pattern Engine

This is the component of the platform that no competitor offers, built on BluelightAI's proprietary topological data analysis technology developed over two decades of mathematical research.

The pattern engine runs periodically (nightly or weekly) across the accumulated decision data in the audit store. It takes the structured input features and decision outcomes from thousands of decisions, builds embeddings, and applies the Mapper algorithm from computational topology to identify clusters and patterns that simple statistical methods miss.

What topological analysis finds that standard monitoring does not:

Non-obvious bias patterns. Not just "denial rate is higher for ZIP code X," which a SQL query can find, but "there is a cluster of applications where income is adequate, credit score is adequate, but the combination of self-employment, property type, and loan purpose creates a denial pattern that does not exist for W-2 employees." The Mapper algorithm identifies the shape of failure modes across multiple correlated features simultaneously.

Cross-model behavioral differences. "When these three input features co-occur, Claude approves and GPT denies. Neither is clearly wrong. The disagreement suggests this decision boundary is unstable and may benefit from a policy rule rather than model discretion."

Temporal drift that hides in averages. "The model's treatment of applications with student loan debt has shifted over the past six months. The aggregate denial rate is flat, but the specific subpopulation being denied has changed." Standard drift detection looks at distribution shifts in aggregate. Topological analysis finds shifts in the structure of the decision space.

These pattern insights surface as alerts in the model risk manager's dashboard, each linked to the specific cluster of decisions that triggered it and the perturbation test results that characterize it.

The Role of Mechanistic Interpretability Research

BluelightAI maintains an active research program in mechanistic interpretability, including training Cross-Layer Transcoders (CLTs) on open-weight models and contributing to open-source circuit tracing tools. This research serves a specific and important role within the Cobalt platform strategy.

What CLTs and circuit tracing provide

Mechanistic interpretability allows researchers to look inside a neural network and identify which specific internal features activate during a decision. Cross-Layer Transcoders decompose model computations into sparse, interpretable features that can be mapped to human-understandable concepts. Circuit tracing follows the causal chain from input tokens through attention heads and MLP layers to the final output, revealing not just what the model decided but how it arrived there.

BluelightAI has trained CLTs on Qwen3 models (0.6B and 1.7B parameters), extracting over 573,000 interpretable features per model, and published both the weights and an interactive explorer for the research community.

Where this fits in the Cobalt platform

For the majority of enterprise deployments, where customers use closed-source API models like Claude or GPT, Cobalt's behavioral interpretability layer (perturbation testing, counterfactual analysis, topological pattern detection) provides the audit capability. Model weights are not available and not required.

However, a growing number of enterprises are deploying open-weight models on their own infrastructure, particularly for use cases involving sensitive data that cannot leave the organization's environment. For these deployments, Cobalt offers a deeper tier of analysis. When model weights are accessible, the platform can capture CLT feature activations alongside the behavioral signals, providing ground-truth mechanistic evidence for why a decision was made rather than inferring it through perturbation alone.

This creates two tiers of interpretability within a single platform:

  • Standard (all models): Perturbation testing, counterfactual generation, cross-model validation, topological pattern analysis. Works with any model, closed or open.
  • Deep (open-weight models): All of the above, plus CLT feature decomposition, circuit-level attribution, and feature steering for targeted model behavior analysis. Available when the customer runs models on their own infrastructure.

How research improves the platform

The mechanistic interpretability research program is not separate from the product. It directly informs Cobalt's design in three ways.

First, understanding how models actually process information internally helps BluelightAI design better perturbation tests. Knowing which features tend to be causally important for credit decisions, rather than merely correlated, makes the behavioral testing layer more precise even when applied to closed-source models.

Second, as frontier labs begin exposing interpretability signals through their APIs (a trend that is already underway), Cobalt is positioned to ingest those signals immediately. The research team's familiarity with CLTs, sparse autoencoders, and circuit tracing means the platform can integrate new interpretability data sources as they become available, without rebuilding the architecture.

Third, the research establishes BluelightAI's technical credibility with the exact audience the product serves. Risk officers and chief data officers at financial institutions need confidence that the company behind their audit platform genuinely understands model internals at the deepest level, not just surface-level input/output testing. Published CLT research and open-source contributions provide that confidence.

Deployment Model

Weeks 1 to 4. SDK integration, audit store deployment, input decomposition configuration for the first use case. By the end of this phase, every AI decision in the target workflow is being captured with a structured audit record.

Weeks 5 to 8. Perturbation test suite configuration, counterfactual analysis tuning, cross-model validation setup. The enrichment pipeline is producing meaningful interpretability artifacts for every decision.

Weeks 9 to 12. Dashboard deployment, compliance report template configuration, SSO integration, frontline embed setup. All four user personas have access. The first regulatory reports can be generated.

Month 4 onward. Topological pattern engine activated as decision volume accumulates. Ongoing tuning of perturbation tests based on findings. Expansion to additional use cases and model providers.

Integration requirements

  • Access to the API call layer where the organization's systems invoke LLM providers. The SDK wraps these calls.
  • A cloud environment (AWS, Azure, or GCP) for the audit store, or willingness to use BluelightAI's managed service.
  • SSO/OIDC identity provider for dashboard authentication.
  • No access to model weights, training data, or model provider infrastructure is required.

Defensibility

Against model provider interpretability

When Anthropic, OpenAI, or Google release interpretability features for their own models, those features will work only with that provider's models and produce that provider's format of explanation. BluelightAI normalizes interpretability across all providers into a single audit standard. The more providers a customer uses, the more valuable a model-agnostic platform becomes.

Provider interpretability features, when they arrive, become an additional input to BluelightAI's enrichment pipeline rather than a replacement for it.

Against observability platforms

Observability tools monitor whether models are working. BluelightAI explains why models made specific decisions. These are complementary capabilities. BluelightAI's perturbation testing, counterfactual generation, and topological pattern analysis operate at a different layer than performance monitoring.

Against in-house builds

The platform's value compounds with usage. After twelve months, a customer has a complete audit trail of every AI decision, a library of perturbation tests tuned to their specific use cases, topological pattern models trained on their decision data, and compliance report templates configured for their regulatory environment. Replicating this internally means not just building the software but also accumulating a year of operational data and domain-specific configuration.

Against model churn

When a customer switches from one LLM provider to another, or upgrades to a new model version, the audit platform continues to function without modification. The capture SDK adapts to the new provider. The perturbation tests run the same way. The audit store maintains continuity across model changes. The compliance reports do not need to be reconfigured.

The platform's value is independent of which models are in use at any given time.

The Regulatory Landscape

The regulatory environment is moving decisively toward mandated AI transparency in financial services.

EU AI Act (effective August 2026). Classifies credit scoring and insurance pricing as high-risk AI systems. Requires providers to implement transparency measures, maintain technical documentation, enable human oversight, and keep records for regulatory inspection. Article 14 specifically mandates that high-risk systems be designed to allow human oversight of operation.

Federal Reserve SR 11-7. Requires banks to maintain model risk management frameworks for any model used in decision-making. This includes documentation of model purpose and design, ongoing monitoring of model performance, validation by parties independent of model development, and clear governance of model changes.

CFPB and ECOA. When a credit application is denied, the lender must provide specific reasons for the adverse action. Generating these reason codes from an LLM's output is not trivial. BluelightAI's counterfactual analysis directly identifies which input factors would change the decision, mapping cleanly to adverse action reason codes.

State insurance regulations. Most states require insurers to demonstrate that claims handling is fair, timely, and well-documented. AI-assisted claims decisions are increasingly subject to the same scrutiny as human decisions.

Organizations that deploy AI without a robust audit trail are not just accepting operational risk. They are building a compliance liability that grows with every decision the model makes.

Summary

BluelightAI is building the governance infrastructure for enterprise AI in regulated industries.

The platform captures every AI decision, tests why it was made, stores an immutable audit record, and produces the regulatory documentation that compliance and risk teams need. It works across any model provider, without requiring access to model internals.

The core technical differentiators are behavioral interpretability through structured perturbation testing, which works with any model including closed-source APIs, and topological pattern analysis, which finds systematic issues across large decision populations that no other approach can detect.

The result is a platform where every stakeholder, from the frontline analyst to the board member to the regulator, can understand, verify, and defend the AI decisions their organization makes.