When AI Writes the Master Batch Record: The New CGMP Risk in Biopharmaceutical Manufacturing

A 2026 FDA warning letter exposes a larger problem: AI can generate plausible quality documentation, but CGMP still demands validated evidence, accountable human review, and deterministic control

Apr 29, 2026

In April 2026, the U.S. Food and Drug Administration (FDA) issued a warning letter to Purolea Cosmetics Lab, a small manufacturer of homeopathic products. FDA inspectors found gross insanitary conditions, including “significant disrepair” of the facility, insects in the manufacturing area, and contamination from leaves and filth [1]. More concerning for regulators, Purolea had used an unvalidated AI agent to author drug‑product specifications, standard operating procedures (SOPs), and batch records [1]. When inspectors requested the required process‑validation data, management explained that they did not know a validation study was necessary because the AI system had not flagged this requirement [1]. The FDA forced the firm to halt production. The letter emphasized that any AI‑generated documentation must be reviewed and approved by the quality unit; failure to do so violates 21 CFR § 211.22(c) [1]. Although Purolea operates in a low‑risk sector, its failure serves as a stress test for the entire biopharmaceutical industry, illustrating the clash between probabilistic AI outputs and the deterministic requirements of current good manufacturing practice (CGMP).

In tier‑1 biopharma, the risk isn’t consumer‑grade AI writing SOPs. The real danger is proprietary “enterprise AI” used to optimize bioreactor feed strategies, viral vector production, or empty‑capsid ratios. These models are often seen as too complex to question, and their predictions can affect multi-million-dollar manufacturing decisions. Without thorough validation and human oversight, blindly trusting AI could result in contaminated batches and patient harm.

The sycophancy trap: AI prioritizes agreement over truth.

A 2025 study in npj Digital Medicine highlights an inherent risk of large‑language models (LLMs). Researchers tested five advanced models with prompts intentionally containing medically incorrect requests. The models obeyed the false prompts almost 100% of the time [2], prioritizing helpfulness over factual accuracy. Even after fine‑tuning, some models still followed misinformation requests at high rates (up to 100% for certain GPT‑4 models and 94% for others) [2]. The authors labeled this behavior “sycophancy,” an AI tendency to agree with a user’s prompt rather than question its assumptions [2]. In a manufacturing setting, this can appear as yield‑optimization bias. If a plant manager asks an AI to “maximize adeno‑associated virus (AAV) yield,” a sycophantic model may suggest aggressive bioreactor settings that increase total capsid production but fail to account for the ratio of empty to full capsids or downstream safety implications.

Managers at Purolea believed the AI understood regulatory requirements, but it only produced plausible text. LLMs are good at pattern matching and creating convincing language, but they lack understanding of legal and biological complexities. This gap between credible language and regulatory knowledge should guide AI use in CGMP settings.

Gene therapy: complex biology leaves no room for AI error

Gene and cell therapies depend on complex biological processes that are inherently random. Even small deviations can impact product quality and patient safety. Studies of AAV manufacturing show that viral capsid assembly is variable and stochastic, resulting in a diverse population of capsids with unique compositions [3]. Production results in not only full capsids carrying the therapeutic gene but also intermediate and empty capsids containing truncated genomes, plasmid fragments, or host‑cell DNA [4]. These impurities must be measured and managed because they decrease potency and may trigger immune responses [4].

The consequences of poor impurity control were demonstrated in a 2026 Nature Medicine report describing a child who developed severe hepatitis after receiving onasemnogene abeparvovec, an AAV gene therapy. Metagenomic sequencing of the patient’s liver identified sequences from all three manufacturing plasmids and showed extensive disruption of vector genomes [5]. The authors noted that plasmid remnants and concatemers remained in liver tissue despite purification [5]. Similar impurity profiles have been seen in other advanced therapy medicinal products. Lentiviral vectors and CRISPR‑edited cell therapies carry their own risks of off‑target integration and oncogenic effects.

AI models used to optimize transfection conditions or purification parameters should be regarded as hypothesis‑generating tools. A recommendation to reduce empty‑capsid content must be independently confirmed by orthogonal assays. For AAV products, this includes droplet digital PCR (ddPCR) and native mass spectrometry [3]; for cell therapies, it may involve flow cytometry and functional assays. A position paper on academic CAR‑T cell manufacturing recommends potency testing via flow‑cytometry‑based cytotoxicity and proliferation assays, mycoplasma and endotoxin testing, quantification of vector copy number, and cytokine release assays to ensure safety and efficacy [6]. The Purolea incident illustrates what can happen when AI output is accepted without such empirical verification: regulatory requirements were overlooked, and production was halted. In high‑risk therapies, the consequences could be much more severe.

Hallucinations and the “bixonimania” experiment

AI’s ability to hallucinate plausible yet false medical information has been clearly demonstrated. In 2026, researchers created a fictional skin condition called bixonimania and uploaded fake preprints about it. Frontier models, including Google’s Gemini, ChatGPT, Microsoft Copilot, and Perplexity, confidently diagnosed the nonexistent condition, linked it to “blue light exposure,” and provided ophthalmological advice [7]. These models treated the counterfeit papers as authoritative because they resembled scientific articles. For a gene-therapy quality unit, such hallucinations could lead models to certify a batch as safe based on false positive patterns rather than real analytical data. Without human oversight and independent cross-checks, hallucinations can make AI seem more capable than it really is, leading to dangerous decisions.

A governance framework for AI in biopharmaceutical manufacturing

In January 2025, the FDA issued draft guidance on using AI in regulatory decision-making for drugs and biologics. The guidance presents a risk-based credibility assessment framework. It requires sponsors to define the context of use, assess the model’s risk, develop and implement a credibility assessment plan, evaluate performance metrics (such as receiver-operating characteristic curves, sensitivity, and specificity), and document the results [8]. If the model cannot demonstrate enough credibility for its intended use, sponsors must reduce its influence, increase validation efforts, modify the model, or discard it [8]. Notably, the guidance stresses the importance of considering the performance of the human–AI team whenever a “human in the loop” is involved [8]. Building on this regulatory framework and lessons from Purolea, we propose eight essential pillars for responsible AI in biopharmaceutical manufacturing.

**Figure 1.** Proposed eight-pillar governance framework for responsible AI use in biopharmaceutical manufacturing.

Human ownership. Every AI‑generated SOP, batch record, or parameter optimization should be treated as a draft. A qualified person must review and approve it, taking legal responsibility. The FDA warning letter to Purolea highlighted that quality units are responsible for reviewing AI‑generated documents [1]. This follows the broader legal principle that physicians and health systems are still liable for decisions based on AI recommendations. Courts have established that practitioners cannot avoid malpractice liability by blaming flawed clinical decision support [12].
Empirical validation mandate. AI‑derived predictions of critical process parameters (CPPs) or critical quality attributes (CQAs) should be considered hypotheses. They must be validated with independent, orthogonal methods (e.g., mass spectrometry, ddPCR, flow cytometry) before informing a batch‑release decision. Because AAV capsid assembly is naturally heterogeneous [3] and intermediate capsids can contain harmful DNA sequences [4], relying solely on computational output is insufficient.
Explainability (XAI). Models used in CGMP environments should generate interpretable outputs. Tools like SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) can help quality teams understand why a model recommends a specific parameter change. If the reasoning cannot be explained, the output should not be used to support CGMP decisions.
Continuous drift monitoring. AI models evolve as they encounter new data. Continuous monitoring of performance using real-world data and automatic alerts for degradation are essential. The FDA draft guidance recommends documenting model performance metrics with uncertainty estimates and requalifying models when needed [8].
Data integrity and provenance. All data used to train, validate, and operate AI models must follow ALCOA+ principles (Attributable, Legible, Contemporaneous, Original, Accurate, and Complete). The source from raw sensor input to the final decision must be auditable. This requirement aligns with ICH Q10 and CGMP data‑integrity expectations.
Validation lifecycle management. AI models should follow the same lifecycle approach as other computerized systems under GAMP 5: specification, design, testing, deployment, performance monitoring, and retirement. Each stage must be documented and re-validated as processes or data change.
Bias detection and mitigation. AI models can unintentionally reinforce biases present in training data. Regular audits are necessary to assess whether model outputs disproportionately impact specific batch types, patient groups, or product features. Bias reduction strategies (such as balanced training datasets and fairness metrics) should be put into place and documented.
Cybersecurity and data protection. Adding AI to manufacturing networks creates additional security vulnerabilities. To safeguard proprietary process data and prevent malicious alterations to model parameters or inputs, it is essential to implement access controls, encryption, and ongoing threat monitoring.

The path forward: digital twins, sandboxes, and accountability

Digital twins and real‑time release testing

Digital twins, virtual models connected to physical systems, enable real-time monitoring and predictive analytics. A 2025 review on digital twins in pharmaceuticals states that they can improve operational efficiency, lower costs, and enhance product quality by supporting real-time monitoring and incorporating AI and machine learning [9]. However, the authors emphasize key challenges, such as data integration, model accuracy, and regulatory hurdles [9]. For digital twins to facilitate real-time release testing (RTRT), regulators need to shift from batch-end testing to continuous monitoring. This will require a regulatory sandbox where digital-twin models can be tested alongside traditional methods, helping to build evidence for approval.

Regulatory sandboxes

A literature review of the sandbox approach states that regulatory sandboxes, secure environments for testing new regulatory processes, started in fintech and are now being explored in healthcare [10]. The review shows that sandboxes are mainly used in high‑income countries to support the adoption of new digital technologies; they offer controlled testing environments and can help guide policy development [10]. For biopharma manufacturing, sandboxes could enable AI‑driven digital‑twin systems to operate under real-world conditions while still being under regulatory oversight, allowing iterative improvements of both technology and policy.

Closing the accountability gap

The legal landscape for AI liability remains uncertain. An analysis in the Milbank Quarterly notes that multiple stakeholders, including physicians, health systems, and algorithm developers, share responsibility for AI-related injuries, and that current frameworks are inadequate to support both safety and innovation [12]. Courts generally treat clinical decision support software as a service, which makes it difficult for injured patients to sue developers under product liability [11]; however, some scholars predict that as deep learning systems become less interpretable, courts may adopt product liability theories [11]. To close this accountability gap, the industry might need new roles such as AI quality officers, licensed professionals trained to oversee and validate algorithmic decisions in CGMP settings. These officers would ensure that human accountability remains central while providing technical oversight of AI systems.

The competitive paradox and the “third way”

The global gene‑therapy landscape highlights a conflicting situation. An editorial in 2026 from Nature Medicine states that over 50 gene therapies have already received approval worldwide, but the number of active development programs decreased in 2025 [14]. This decline was caused by safety concerns following deaths in clinical trials and the high manufacturing costs, which even resulted in discontinuing a hemophilia B therapy less than a year after its approval [14]. The editorial urges increased investment, transparent data sharing, and predictable regulatory frameworks to sustain progress in gene therapy [14]. Meanwhile, simulation modeling indicates that gene therapy expenditures could reach about $20.4 billion annually, with roughly half allocated to treatments for non-Medicare adults and children [13]. If Western regulators demand strict validation while other regions adopt more lenient, AI-driven methods, the West risks losing competitiveness. However, lowering standards could lead to safety issues that damage public trust. The “third way” offers a middle ground: use AI to accelerate innovation and cut costs, but maintain human oversight and empirical validation as essential, non-negotiable pillars.

Conclusion

The Purolea case signifies the end of AI’s experimental phase in biopharmaceutical manufacturing. Regulatory agencies will no longer accept unvalidated AI as a justification for CGMP failures. In the intricate realm of cell and gene therapy, where the product is the process, deterministic validation and human accountability are crucial. AI can enhance human ability by analyzing data volumes beyond what humans can process, but it must be subordinate to a strong quality system based on empirical testing, transparency, and ongoing oversight. By adopting principles like human ownership, empirical validation, explainability, and continuous monitoring, the industry can leverage AI’s capabilities without risking patient safety. Failure to do so will not only generate warning letters but could also cause serious adverse events, legal liabilities, and a loss of public trust.

References

[1] U.S. Food and Drug Administration. (2026, April 2). Warning Letter: Purolea Cosmetics Lab 722591. Retrieved from https://www.fda.gov/inspections-compliance-enforcement-and-criminal-investigations/warning-letters/purolea-cosmetics-lab-722591-04022026

[2] Chen, S., Gao, M., Sasse, K., et al. (2025). When helpfulness backfires: LLMs and the risk of false medical information due to sycophantic behavior. npj Digital Medicine, 8, 605. https://www.nature.com/articles/s41746-025-02008-z

[3] Wörner, T. P., Bennett, A., Habka, S., et al. (2021). Adeno-associated virus capsid assembly is divergent and stochastic. Nature Communications, 12, 1642. https://www.nature.com/articles/s41467-021-21935-5

[4] McColl-Carboni, A., Dollive, S., Laughlin, S., Lushi, R., et al. (2024). Analytical characterization of full, intermediate, and empty AAV capsids. Gene Therapy, 31, 285–294. https://www.nature.com/articles/s41434-024-00444-2

[5] Buddle, S., et al. (2026). Contaminating plasmid sequences and disrupted vector genomes in the liver following adeno-associated virus gene therapy. Nature Medicine, 32, 472–480. https://www.nature.com/articles/s41591-025-04073-z

[6] Marton, C., Clémenceau, B., Dachy, G., et al. (2025). Harmonisation of quality control tests for academic production of CAR‑T cells: a position paper from the WP‑bioproduction of the UNITC consortium. Bone Marrow Transplantation. https://www.nature.com/articles/s41409-025-02637-8

[7] Stokel‑Walker, C. (2026, April 7). Scientists invented a fake disease. AI told people it was real. Nature, 652, 559–561. https://www.nature.com/articles/d41586-026-01100-y

[8] U.S. Food and Drug Administration. (2025, January 6). Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making for Drug and Biological Products: Draft Guidance for Industry. Retrieved from https://www.fda.gov/regulatory-information/search-fda-guidance-documents/considerations-use-artificial-intelligence-support-regulatory-decision-making-drug-and-biological

[9] Maharjan, R., Kim, N. A., Kim, K. H., & Jeong, S. H. (2025). Transformative roles of digital twins from drug discovery to continuous manufacturing: pharmaceutical and biopharmaceutical perspectives. International Journal of Pharmaceutics: X, 10, 100409. https://www.sciencedirect.com/science/article/pii/S2590156725000945

[10] Leckenby, E., Dawoud, D., Bouvy, J., & Jónsson, P. (2021). The Sandbox Approach and its Potential for Use in Health Technology Assessment: A Literature Review. Applied Health Economics and Health Policy, 19(6), 857–869. https://pmc.ncbi.nlm.nih.gov/articles/PMC8545721/

[11] Evans, B. J., & Pasquale, F. (2022). Product liability suits for FDA‑regulated AI/ML software. In I. Glenn Cohen, Timo Minssen, W. Nicholson Price II, Christopher Robertson, & Carmel Shachar (Eds.), The Future of Medical Device Regulation: Innovation and Protection. Cambridge University Press.

[12] Maliha, G., Gerke, S., Cohen, I. G., & Parikh, R. B. (2021). Artificial intelligence and liability in medicine: balancing safety and innovation. The Milbank Quarterly, 99(3), 629–647. https://pmc.ncbi.nlm.nih.gov/articles/PMC8452365/

[13] Wong, C. H., Li, D., Wang, N., Gruber, J., Lo, A. W., & Conti, R. M. (2023). The estimated annual financial impact of gene therapy in the United States. Gene Therapy, 30, 761–773. https://www.nature.com/articles/s41434-023-00419-9

[14] Nature Medicine Editorial Board. (2026, March 13). Keep up the momentum for gene therapies. Nature Medicine, 32, 769. https://www.nature.com/articles/s41591-026-04311-y

Abraham Samuel Finny | Analytical Scientist

Discussion about this post

Ready for more?