AI ComplianceGuide #03

GAMP 5 & the ISPE AI Guide: Translating the 290-Page Framework into Actionable Validation

April 27, 202617 min read

The most consequential AI compliance document of the last two years was not published by a regulator. It was published in July 2025 by the International Society for Pharmaceutical Engineering — a 290-page stand-alone guide titled ISPE GAMP® Guide: Artificial Intelligence, designed to be used in parallel with GAMP 5 Second Edition.

This Guide will not show up in a 483 letter. It will not be cited in a Warning Letter on its own. But it is the document FDA and EMA inspectors will quietly assume you have read when they ask why your AI-enabled CTMS module, your pharmacovigilance triage model, or your clinical narrative generator has the validation footprint it does. Conformance to it is fast becoming the practical floor for AI under GxP.

We have spent the last six months working through the Guide's framework alongside clinical data and pharmacovigilance teams. This article is the version we wish we had when we started: the parts that matter, the parts that are genuinely new, and the parts a clinical research team can actually operationalize.

TL;DR

**Two documents, not one.** GAMP 5 Second Edition (2022) added Appendix D11 covering AI/ML at a high level. The stand-alone *ISPE GAMP Guide: Artificial Intelligence* (July 2025) extends D11 into a full 290-page treatment. You need both.
The AI Guide is industry-built, not regulator-issued. It was developed by 20+ contributors through the GAMP Community of Practice's Software Automation and AI Special Interest Group, with co-leads from Syneos Health, The Triality Group, and others. That matters: it is the closest thing to a consensus best-practice document the industry has, and it is being written into supplier contracts and internal SOPs faster than any single regulatory paper.
The framework is the GAMP V-model, adapted. Concept → project → operation → retirement. What changes is what fills each phase: data governance, model lifecycle, drift monitoring, and change control specific to AI replace traditional code-validation activities.
Five things are genuinely new versus traditional GAMP: AI-specific Quality Risk Management, dynamic systems handling, AI cybersecurity (including adversarial attacks and prompt injection), AI as and within medical devices, and a deeper treatment of supplier and service-provider qualification for AI vendors.
Clinical research is in scope. CROs, eClinical platforms, AI-enabled patient recruitment, pharmacovigilance triage, eTMF auto-classification, AI medical writing, and AI in safety signal detection all fall under this Guide once they touch GxP-regulated processes.

Why this Guide matters now

ISPE has been the de-facto standard-setter for computerized system validation in pharma since the late 1990s. GAMP 4 set the V-model standard. GAMP 5 (2008) introduced the risk-based approach that made validation tractable. The 2022 Second Edition modernized GAMP for cloud, agile, open-source, and a first pass at AI/ML through Appendix D11.

By 2023 it was clear D11 was not enough. Generative AI had collapsed the timelines on which compliance frameworks were written. The GAMP AI Special Interest Group, which had been working on AI guidance since 2019, accelerated. The result is a 290-page stand-alone Guide that does for AI in regulated life sciences what GAMP 5 did for general computerized systems: provides a common language, a defensible framework, and a practical floor.

Regulators are not silent on this. The FDA's January 2025 draft guidance on AI for regulatory decision-making and its January 2025 draft on AI-enabled device software functions both align conceptually. The EMA's final 2024 reflection paper points the same direction. EU draft Annex 22 (mid-2025) brings AI validation into European GMP territory specifically. None of those documents carry the operational depth of the ISPE GAMP AI Guide. In practice, when a regulator asks how you validated an AI system — not whether — the GAMP AI Guide is the answer most large sponsors and CROs will give.

The two-document foundation

Treating the AI Guide as a stand-alone is a mistake. It is explicitly designed as a companion. The way the two documents work together looks roughly like this:

GAMP 5 Second Edition (2022) still owns:

The risk-based validation philosophy
Software categorization (Categories 1, 3, 4, 5)
The V-model lifecycle for general computerized systems
The five guiding principles: product and process understanding, lifecycle approach, scalable lifecycle activities, science-based QRM, and leveraging supplier involvement
21 CFR Part 11 and EU Annex 11 alignment for general systems
Appendix D11, which provides the AI/ML on-ramp

**The ISPE GAMP AI Guide (July 2025)** then takes over for any AI-enabled component:

A holistic AI lifecycle from ideation through retirement
Data governance and model governance frameworks
AI-specific Quality Risk Management
Handling of dynamic (continuously learning or frequently retrained) systems
AI cybersecurity considerations
Roles and responsibilities for AI development and operation
Supplier and service-provider qualification specific to AI
AI in and as medical device
Inspection readiness for AI systems

In a real validation package, GAMP 5 Second Edition tells you how to validate the wrapper — the eClinical platform, the data pipeline, the GxP infrastructure. The AI Guide tells you how to validate the model and its supporting data and operational controls.

What is actually new

The five things genuinely new in the AI Guide are also the five places teams predictably get caught. We have watched this play out across audits, vendor evaluations, and validation rewrites over the last six months — the failure patterns are remarkably consistent, and they cluster in the same five areas the Guide expands.

1. AI-specific Quality Risk Management

Traditional GAMP QRM, rooted in ICH Q9, asks what could go wrong with the system and how badly it could affect patient safety, product quality, and data integrity. The AI Guide extends this with risk categories that traditional CSV does not handle well: training data bias, distributional shift, algorithmic error modes that have no analog in deterministic code, model overfitting, and model drift. The Guide pushes risk assessment from a one-time activity at validation toward a continuous practice across the AI lifecycle. The implication is operational: you cannot validate an AI system in February and assume it is still validated in November without ongoing performance and drift monitoring evidence.

2. Dynamic systems

GAMP 5 Second Edition introduced the concept of dynamic systems in Appendix D11 — systems that change behavior between formal change events, either through retraining, online learning, or shifting input distributions. The AI Guide expands this substantially. It treats the dynamic-systems problem across all three phases: concept (decide upfront whether and how the system will adapt), project (define triggers for retraining and re-qualification), and operation (monitor for drift and act on it).

This is the part most clinical research teams underestimate. A static model deployed in 2025 trained on patient data through 2024 is not the same model in mid-2026 if patient populations or clinical practice patterns have shifted. The AI Guide forces an explicit decision: lock the model and accept performance decay, or build the retraining and re-validation infrastructure to handle change controlled adaptation.

The most common pattern we see is a model deployed in production with no explicit decision about adaptation — neither formally locked nor formally adaptive, just running. When the question gets asked in audit prep, the team discovers they have been operating a dynamic system without the controls a dynamic system requires. The retrofit is significantly harder than the upfront decision would have been.

3. AI cybersecurity

The Guide treats cybersecurity for AI as distinct from traditional system cybersecurity. It addresses adversarial attacks on training data (data poisoning), adversarial inputs at inference time (prompt injection for LLMs, adversarial examples for vision models), and model theft or extraction. None of these threats existed in the traditional GAMP threat model. The Guide does not solve them — it requires you to have a position on them.

For clinical research specifically, the prompt-injection risk for LLM-based tools is not academic. An AI medical writing assistant that ingests a study report and is then exposed to user prompts can be manipulated into producing outputs that would never pass review. The Guide pushes teams to think about this at design, not after a finding.

4. AI as and in medical device

The Guide embeds medical device design review and risk management practices into the general framework, recognizing that AI increasingly sits inside or alongside SaMD (Software as a Medical Device). For clinical research, this matters when AI tools cross the line into supporting clinical decisions or when AI features are embedded in devices used in trials. The IEC 62304 and ISO 14971 expectations show up in the Guide alongside GxP language.

5. Supplier and service-provider qualification

GAMP 5 has always emphasized leveraging supplier involvement. The AI Guide adds AI-specific qualification expectations: training data provenance, model documentation, performance evidence under the supplier's testing conditions, change-notification commitments, and explainability artifacts where appropriate. This is the lever clinical research teams will feel first — every AI vendor pitch in 2026 will be tested against these criteria.

The AI lifecycle

Where the Guide does its most useful work is in redistributing the GAMP V-model load. The lifecycle is recognizable — four phases — but the weight shifts to the front (concept, intended use) and to operation (monitoring, drift, change control), not to the validation handoff that traditional CSV treats as the centerpiece.

Concept phase. Define intended use precisely. Decide whether the system will be static or dynamic. Establish a preliminary risk assessment that will be refined later. Make the build-buy-partner decision with AI-specific evaluation criteria. This is the phase most teams skip or compress, and most downstream pain originates here.

Project phase. This is where the AI lifecycle diverges most visibly from traditional CSV. Data governance comes online: data sourcing, lineage, quality assessment, bias evaluation, and test/train/validation split discipline. Model development happens with documented experiment tracking. Performance metrics are defined against intended use, not against generic accuracy. The verification and qualification activities mirror GAMP IQ/OQ/PQ but with model-specific tests: performance against representative held-out data, robustness testing, bias testing across relevant subpopulations, and explainability evaluation where required.

Operation phase. The longest phase and the one that breaks traditional CSV. Continuous monitoring for performance, drift, and data quality. Logging that supports audit reconstruction. Periodic review with documented sign-off. Change control adapted for AI: when does retraining trigger re-qualification, and at what depth?

Retirement. Often forgotten. The Guide explicitly addresses model retirement, archival of training data and model artifacts for the regulatory retention period, and migration paths.

Risk management adapted for AI

The Guide's QRM framework is one of its most useful contributions. The traditional FMEA approach — list failure modes, score severity and probability, derive risk priority numbers — does not gracefully accommodate distributional risk. The Guide layers AI-specific considerations onto the existing QRM scaffold rather than replacing it.

In practice, what we see working is a two-tier risk assessment. The first tier is the GAMP categorical assessment: what is the system, what category does it fall in (1, 3, 4, or 5), what GxP processes does it influence, and what is the impact severity if it fails. The second tier is AI-specific: what is the data risk profile (sources, freshness, bias potential, representativeness), what is the model risk profile (architecture transparency, sensitivity to input shifts, known failure modes), and what is the deployment risk profile (autonomy level, human oversight, reversibility of outputs).

The autonomy and reversibility dimensions matter most. An AI system that drafts a clinical narrative for human review carries fundamentally different risk than one that auto-codes adverse events into MedDRA without human-in-the-loop checking. The Guide pushes teams to be explicit about where in the spectrum each AI use case sits, and to scale validation effort accordingly.

Dynamic systems and the change-control problem

The hardest section to operationalize is dynamic systems. Traditional change control assumes discrete change events: a code change is proposed, evaluated, tested, approved, deployed. AI breaks this in two ways. First, retraining is often scheduled and routine, not exceptional. Second, the system can shift in behavior between formal changes, simply because input data distributions move.

The Guide's framework for this is to define adaptation boundaries upfront. What kinds of changes are permitted under what conditions? A model that retrains weekly on the same architecture with quality-controlled data and pre-defined performance gates is one regime. A model that can change architecture or hyperparameters under retraining is another. Each regime has a different change-control footprint.

The practical implication for clinical research: most off-the-shelf AI tools your CRO or sponsor procures will not have well-defined adaptation boundaries documented by the vendor. The qualification effort will involve negotiating those boundaries explicitly, often as contractual change-notification requirements.

Cybersecurity

The Guide treats AI cybersecurity as part of the validation conversation, not as a separate IT problem. Three threat categories get explicit treatment.

Data integrity attacks. Poisoning of training data, whether by adversarial actors or by upstream data quality issues, can produce models that look performant on standard test sets but fail in deployment. The Guide pushes for data lineage and provenance controls strong enough to detect tampering.

Inference-time attacks. Adversarial examples for vision models, prompt injection for LLMs, and jailbreaking for any conversational system. The Guide expects teams to evaluate exposure to these threats during the project phase and to put in place input filtering, output validation, and monitoring proportionate to risk.

Model exfiltration. Extracting a proprietary model through API queries. Less directly relevant to clinical research workflows, but increasingly relevant to sponsors who consider their AI assets competitive infrastructure.

We expect FDA inspection questions on prompt injection specifically to start appearing in 2026 for any sponsor using LLMs in regulatory document workflows. The Guide is the framework that supports a defensible answer.

Supplier qualification and the AI vendor question

GAMP 5 has always permitted leveraging supplier development and testing where appropriately qualified. For AI vendors, the qualification bar is higher and more specific. The Guide enumerates expectations including: documented training data sources and characteristics, evidence of bias evaluation, model performance documentation, change-management commitments including notification of retraining, security controls, and explainability artifacts where the use case requires.

The asymmetry here is real. A traditional eClinical SaaS vendor with a mature SOC 2 program and ICH-aligned validation evidence is a known quantity. An AI startup in 2026, even one with strong technical capability, often does not have the documentation maturity to support a Cat 4 or Cat 5 GxP qualification out of the box. Sponsors and CROs that want to deploy these tools have three realistic paths: invest heavily in vendor uplift, take on more of the qualification effort internally, or wait for the AI vendor ecosystem to mature into GAMP-readiness.

The pattern we see most often during procurement is sponsors discovering that their preferred AI vendor has no documented process for notifying customers when the underlying model is retrained — and in some cases no internal awareness that retraining notification would be expected at all. The contract gets renegotiated mid-procurement, the legal review extends by weeks, or the deployment slips by a quarter while the vendor builds the notification mechanism. Building this expectation into the RFP rather than discovering it at contract review is the cheapest version of this problem.

We see all three paths in practice. The most common is a hybrid: vendor uplift on the highest-risk components combined with sponsor-side qualification on data and operational controls.

How GAMP fits with FDA, EMA, and the EU AI Act

Where this Guide actually fits in the broader regulatory landscape is as the operational layer underneath the major principles documents emerging in parallel. The regulators set direction; the Guide tells you how to comply with that direction in practice.

FDA. The January 2025 draft guidance on AI for regulatory decision-making and the FDA's 7-step credibility assessment framework are conceptually compatible. GAMP fills in the operational detail FDA's guidance leaves to industry. The Guide can serve as the validation backbone behind a credibility assessment package.

EMA. The 2024 reflection paper on AI in the medicinal product lifecycle is principles-led. The GAMP AI Guide provides the operational scaffolding. EU draft Annex 22 (2025) for AI in pharma manufacturing is more prescriptive and aligns with several Guide sections directly, particularly on continuous monitoring and supplier oversight.

EU AI Act. For AI systems classified as high-risk under the Act — and several clinical research applications will be — the Act's requirements on risk management, data governance, technical documentation, and human oversight map readably onto the GAMP AI Guide structure. The Guide does not satisfy the Act on its own, but a GAMP-compliant AI system has most of the documentary skeleton an AI Act conformity assessment will need.

We cover the FDA credibility framework in Guide 01 and the EMA reflection paper in Guide 02. The EU AI Act treatment specific to pharma is forthcoming in Guide 05.

What clinical research teams should actually do

The Guide is 290 pages and runs $515+ from ISPE. Most clinical research practitioners — biostatisticians, clinical data managers, pharmacovigilance leads, medical writers — will not read it cover to cover. They will encounter it through their internal QA function, vendor evaluations, or audit findings. Five practical moves are worth making now.

First, inventory your AI footprint. Most organizations underestimate what counts as AI under the Guide. AI-assisted patient recruitment, PV signal detection, eTMF auto-classification, AI-assisted medical writing, AI clinical narrative generators, AI-enabled risk-based monitoring, and increasingly AI-assisted protocol design all fall in scope when they touch GxP processes. An honest inventory is the precondition for everything else.

Second, classify each system on the autonomy spectrum. Where on the human-oversight scale does each tool sit? What outputs are reviewed before action, and what outputs influence GxP decisions directly? The qualification depth scales with this answer.

Third, establish or tighten data governance for AI training and inference. Lineage, provenance, quality controls, bias evaluation. This is the foundation everything else rests on, and it is the area where most teams have the largest gaps.

Fourth, get serious about post-deployment monitoring. Performance metrics defined against intended use, drift detection, periodic review with documented sign-off. The Guide expects this to be operational, not aspirational.

Fifth, rewrite supplier qualification criteria to include AI-specific evidence. Push back on vendors who cannot produce training data documentation, bias evidence, or change-notification commitments. The market will move toward providing these. Demand accelerates the move.

For clinical research practitioners building AI workflow capability inside regulated environments, the GAMP AI Guide is not a constraint to be navigated. It is the floor on which durable, inspection-ready AI capability gets built. Teams that internalize it now will move faster, not slower, over the next three years — because they will not be relitigating validation foundations every time an inspector asks how the model works.

References

ISPE. [*GAMP® Guide: Artificial Intelligence*](https://ispe.org/publications/guidance-documents/gamp-guide-artificial-intelligence). July 2025.
ISPE. GAMP® 5: A Risk-Based Approach to Compliant GxP Computerized Systems, Second Edition. 2022.
ISPE. "New GAMP® Guide Addresses Challenges Posed by AI-Enabled Computerized Systems." Pharmaceutical Engineering, July 2025.
ISPE. "New GAMP® Guide Provides Framework to Achieve High-Quality AI-Enabled Computerized Systems for Life Sciences." Pharmaceutical Engineering, August 2025.
FDA. Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making for Drug and Biological Products (Draft Guidance). January 2025.
FDA. Artificial Intelligence-Enabled Device Software Functions: Lifecycle Management and Marketing Submission Recommendations (Draft Guidance). January 2025.
EMA. Reflection Paper on the Use of Artificial Intelligence (AI) in the Medicinal Product Lifecycle. 2024.
EMA / PIC/S. Draft Annex 22: Artificial Intelligence. 2025.
ICH Q9(R1). Quality Risk Management. 2023.

This is Guide 03 in the ClinStacks AI Compliance series. Next in the series: 21 CFR Part 11 in the age of AI — electronic records, audit trails, and model versioning for AI systems.