Artificial intelligence (AI) is rapidly reshaping clinical care. AI-based technologies are enabling increasingly autonomous care delivery, making AI a critical tool for addressing gaps in workforce, expanding access, and improving patient outcomes in the U.S. healthcare system. In November 2025, PHTI convened senior leaders from health systems, health plans, technology developers, academia, investment firms, and federal agencies—including clinical experts—for a workshop in Washington, DC, to explore what is needed to scale AI for autonomous healthcare delivery.

The workshop focused on the requirements for safe, effective, and scalable use of clinical AI—with autonomous prescribing for hypertension management and mental health chatbots as illustrative use cases.

Executive Summary

The opportunity for AI in hypertension and mental health care 

Nearly half of U.S. adults—approximately 120 million people—have hypertension. Yet despite the availability of well-established, generally low-cost, and highly effective treatments, only one in four patients successfully achieves blood pressure control. More than one in five U.S. adults live with a mental illness, yet many never receive treatment because of cost barriersclinician workforce shortages, and other factors. 

These gaps in care for both hypertension and mental health reflect persistent failures in our current healthcare system. Effectively managing both conditions requires broad, up-front screening; active treatment; medication initiation and titration; attention to side effects; sustained patient engagement; and support for long-term behavior changes. Current care delivery models lack the capability and capacity to provide this level of continuous management, resulting in widespread underdiagnosis and undertreatment.

Clinicians and patients are increasingly turning to AI to fill gaps. For hypertension management, AI supports measurement, diagnosis, and treatment decisions, with emerging solutions operating increasingly autonomously. In mental health, patients are often turning to publicly available generative AI tools for needed support. However, these tools are typically not designed for this purpose and lack clinical rigor or validation. Purpose-built mental health chatbots—trained on cognitive behavioral therapy and other psychotherapy principles—are now entering the market with promising clinical outcomes for patients with mild-to-moderate symptoms.

With the policy landscape evolving at an unprecedented speed to adapt to these technological advances, clinical AI applications will be positioned to deliver high-quality, accessible care to millions of patients. Yet going from promise to scale requires addressing market reality.

What clinical evidence, performance monitoring, and regulatory changes are necessary to build confidence in these tools for purchasers, clinicians, and patients?

The workshop discussion spanned a common set of questions across two use cases: 

Adoption and Evaluation of Progress 

  • What would need to be true for clinicians to accept and adopt these tools? 
  • How will the market know whether these tools are driving meaningful progress? 
  • What factors would encourage payers to cover clinical-grade solutions? 
  • What would help patients feel confident and safe using them? 

Market and Policy Enablers 

  • What barriers would limit effective adoption today? 
  • How might stakeholders work to accelerate adoption? 
  • What insights from these case studies inform regulation more broadly? 

From promise to scale: Strengthening market confidence   

Purchasers need standards to assess quality and value; innovators need guidance on product development and evidence generation requirements to meet regulatory and market needs; and clinicians need to understand which tools work and for which populations. 

Four key themes emerged from the discussion:  

1: Evidence standards should compare AI to current standards of care and scale with clinical risk.

Evidence requirements must be rigorous enough to build trust, yet practical enough to avoid stalling innovation. This means having different evidence standards based on the risk of using the AI tool. Autonomous AI tools should be compared to local conditions and the care that patients receive today, not to idealized care. For many, the alternative may be poor access or no treatment at all.

2: Performance benchmarks should be based on clinical outcomes, and safety standards should adapt as the evidence grows.

Ambiguity about what constitutes “good” performance remains a barrier to adoption. Metrics must be anchored to specific, measurable, and meaningful clinical outcomes, rather than to process measures.

3: New technologies may be initially tested in lower-risk populations but should scale quickly to high-risk populations to maximize impact.

Lower-risk patients offer tempting on-ramps, but the greatest opportunities for clinical benefits from AI-enabled solutions come from reaching the highest-need patients, including those with higher-complexity conditions and in underserved communities. Reaching these populations may require higher evidence expectations and carry more significant clinical risk.

4: Widespread adoption will depend on building clinician confidence, gaining clarity about legal liability, and aligning payment models.

Even highly effective clinical AI faces adoption resistance in the form of norms and culture, concerns about liability, and misaligned incentives. Many health systems are uneasy moving from “some human involvement” to “little-to-no human involvement.” These tensions are impacting regulatory frameworks and the evolution of clinicians’ roles as adoption grows.

The discussion underscores a central reality: the technologies enabling autonomous clinical AI are advancing faster than the policy, payment, evidentiary frameworks, and organizational readiness, needed to support their adoption. Participants identified meaningful pathways forward for both hypertension management and mental health chatbots but also surfaced unresolved tensions that will require sustained, cross-stakeholder dialogue. The themes that emerged are not unique to these use cases. They reflect foundational questions that will recur as autonomous AI capabilities expand across clinical domains.