Ensure that AI systems implement prompt shielding, output redaction, and secure enclave inference to protect against prompt injection, malicious content leakage, and exposure of sensitive data.
Generative AI models are vulnerable to prompt injection attacks that bypass controls and extract sensitive information. Similarly, raw outputs may contain confidential or disallowed data if not filtered. Secure enclave inference ensures highly sensitive prompts and outputs are handled in isolated, tamper-resistant environments, reducing risk of leakage or compromise.
Ensure that AI systems undergo chaos engineering and adversarial stress testing to validate resilience against unexpected failures and malicious inputs.
Generative AI may fail under adversarial conditions or infrastructure stress. Chaos experiments and adversarial stress tests expose weaknesses, enabling proactive fixes before production failures or exploitation.
Ensure that all AI components undergo continuous security testing, maintain a Software Bill of Materials (SBOM), and are continuously monitored for Common Vulnerabilities and Exposures (CVEs).
Generative AI systems rely on large dependency chains (libraries, frameworks, drivers, and model artifacts). Without active testing and vulnerability tracking, these systems are exposed to supply-chain attacks, dependency exploits, and unpatched vulnerabilities. Continuous assurance strengthens the overall security posture.
Ensure that all communications between generative AI systems, services, and components are protected with Mutual Transport Layer Security (mTLS) to guarantee encryption, authentication, and integrity of data in transit.
Standard TLS ensures data encryption but authenticates only the server. Without mutual authentication, untrusted or rogue clients may connect to model APIs, inference services, or data pipelines. Enforcing mTLS ensures that both client and server identities are cryptographically validated, reducing risks of impersonation, unauthorized access, and man-in-the-middle attacks.
Ensure that all data transferred to and from generative AI models occurs exclusively over private networks, eliminating reliance on the public internet.
Generative AI models often process sensitive prompts, training datasets, or business logic. Sending this data over public networks increases the risk of interception, man-in-the-middle attacks, or leakage of confidential information. Restricting transfers to private networks ensures that model inputs, outputs, and weights remain protected end-to-end.
Ensure that all generative AI model runtime environments are stateless and rely exclusively on ephemeral storage, in alignment with security and cloud best practices.
Persistent state or storage within model runtime environments increases the risk of sensitive data retention, unauthorized access, and non-compliance with data handling policies. By enforcing stateless execution and ephemeral storage, each model invocation is isolated, and no residual data remains after termination. This reduces attack surfaces, prevents data leakage, and supports regulatory requirements around data minimization and retention.
Ensure that all generative AI runtime environments and supporting infrastructure use minimal OS images (e.g., Alpine, Debian Minimal) to reduce attack surface and align with secure deployment best practices.
Full-featured or bloated operating system images contain unnecessary packages, libraries, and services that increase vulnerability exposure and maintenance overhead. Minimal OS images reduce the number of components that must be patched, hardened, and monitored, thereby lowering security risks and simplifying compliance efforts.
Ensure that all operating systems supporting generative AI model runtimes are hardened through timely patching, disabling of unnecessary services, and enforcement of baseline security controls (e.g., firewall rules), following industry best practices.
Unpatched or misconfigured operating systems introduce critical vulnerabilities that attackers can exploit to gain unauthorized access or disrupt model execution. Hardening reduces the attack surface, enforces consistent security posture, and ensures compliance with regulatory and industry standards.
Ensure that debugging and tracing tools are not attached to production model runtime environments unless explicitly authorized through controlled processes.
Generative AI runtimes process sensitive prompts, responses, and embeddings. Attaching debugging or tracing tools can expose raw data, system internals, and memory contents — creating opportunities for data leakage, model theft, or adversarial reverse engineering. Restricting their use ensures runtime confidentiality and preserves model integrity.
Ensure that debugging and tracing tools for generative AI model runtime environments are disabled by default and only used under a break-glass process within secure, hardened environments that prevent data exfiltration.
Debugging tools can expose sensitive prompts, responses, embeddings, and model internals. If misused, they create avenues for data leakage, model theft, insider abuse, or adversarial reverse engineering. Restricting and tightly controlling their use ensures that debugging supports operational needs without compromising security or compliance.
Ensure that generative AI model runtime environments operate without direct internet connectivity and are inaccessible to end users, in order to reduce the attack surface, prevent data exfiltration, and enforce strict access boundaries.
Internet access from runtime environments introduces risks of data leakage, exfiltration, and compromise through unverified external dependencies. Direct end-user access bypasses orchestration, monitoring, and security controls. Restricting both ensures that runtime environments can only interact with approved internal services and are fully mediated by secure APIs.
Ensure that generative AI models run on dedicated compute resources to minimize performance variability, reduce multi-tenancy risks, and strengthen data security and compliance.
Running models on shared compute introduces risks such as noisy-neighbor performance degradation, increased latency, and potential exposure to other tenants’ workloads. By isolating compute resources, the organization gains predictable performance, consistent availability, and stronger guarantees against data leakage — aligning with regulatory and security obligations.
Ensure that generative AI runtime environments are periodically recycled (i.e., torn down and redeployed) to minimize security risks, eliminate residual data, and maintain alignment with baseline configurations.
Long-lived runtime environments increase the risk of configuration drift, memory leaks, residual data exposure, and undetected compromise. By enforcing periodic recycling, organizations ensure environments remain consistent with hardened baselines, reduce attack persistence opportunities, and maintain compliance with secure cloud practices.
Ensure that generative AI runtime execution environments (e.g., vLLM, TensorRT-LLM, or equivalent inference engines) are securely configured, isolated, and aligned with organizational security baselines to minimize risks associated with model execution.
Runtime frameworks asccelerate inference and manage memory, batching, and scheduling. However, misconfigured runtimes may expose sensitive data, over-allocate system resources, or create opportunities for privilege escalation. Standardizing and securing execution environments ensures predictable performance, prevents data leakage, and aligns with compliance requirements.
Ensure that generative AI workloads are executed in sandboxed environments that are properly orchestrated and bound to managed sessions, in order to isolate users, enforce access controls, and prevent persistence of data or processes beyond their authorized lifecycle.
Without sandboxing and session management, workloads may share resources, leak data between users, or persist beyond intended lifetimes. Orchestrated sandbox environments ensure isolation, accountability, and secure teardown of sessions — reducing risks of privilege escalation, data leakage, and compliance violations.
Ensure that global or non-session-specific caches are disabled in model runtime environments to prevent unintended data persistence, cross-user data exposure, and leakage of sensitive information across sessions.
Global caches or shared memory pools can cause data from one user’s interaction to be reused in another session, creating serious risks of data leakage, prompt/response exposure, and compliance violations. Disabling non-session-specific caches enforces data isolation, supports stateless execution, and ensures sensitive data is not retained beyond the user’s session.
Maintain a central repository of AI risk decisions, rationales, and approvals to ensure auditability and organizational learning.
Financial regulators expect documented decision-making around AI risks. Without centralized records, firms face audit gaps, fragmented governance, and knowledge loss.
Maintain a coordinated vulnerability disclosure program and transparency portals to handle security findings related to AI systems.
AI systems may have vulnerabilities in APIs, data handling, or model behavior. Without disclosure programs and transparency, fintech firms risk regulatory action, reputational harm, and delayed remediation.
Maintain up-to-date architecture diagrams, dataflow documentation, and threat models for all AI systems.
In finance, regulators expect firms to demonstrate clear traceability of data and system design. Without documentation and threat modeling, institutions face risks of data leakage, security gaps, and poor audit readiness.
Operate an end-to-end safety management system for AI aligned with recognized industry and regulatory standards.
Financial regulators (e.g., OCC, FCA, EU AI Act) increasingly expect AI to be managed under safety frameworks comparable to operational risk standards. End-to-end systems ensure safety is embedded across the lifecycle — from data to deployment.