Phoenix TestAI — AI-Native Quality Intelligence Platform

Platform Capabilities

Everything a Quality Team Needs.

From test execution to AI-driven insights. Every capability solves a real problem. Built features ship today. Pipeline items are coming next.

Built

📊

Command Centre

13 configurable widgets: release verdict, execution trends, module health, flake tracking, agent status, coverage gaps. Widget configurator for layout customization. All data live via SSE.

AI Verdict14 WidgetsConfigurator

Built

💬

26-Tool AI Chat

Context-aware AI assistant. 13 DB tools + 4 browser tools (Playwright MCP) + 4 code tools (Filesystem MCP) + 5 repo tools (GitHub MCP). Tools load dynamically based on MCP connections.

13 DB Tools4 Browser5 GitHub4 Code

Built

🤖

6 Autonomous Agents

Triage, Discovery, Strategy, Learning, Generator, Executor. Event-driven with ReAct reasoning loop, confidence scoring, 3 autonomy levels, and human-in-loop approval via Proposals page.

ReAct LoopEvent BusProposals UI

Built

📋

Test Management

Requirements with Jira import + AI expansion. Test Cases with steps. Test Plans with manual execution. Defects with 7-status workflow. Requirements Traceability Matrix linking everything.

RequirementsTest CasesDefectsRTM

Built

🔌

MCP Integration

Model Context Protocol with Playwright, Filesystem, and GitHub servers. Server registry with connect/disconnect. Quick-add presets. Agents and chat access all MCP tools through standard interface.

PlaywrightFilesystemGitHubRegistry

Built

📈

Reports & Analytics

8 HTML report types. 8 analytics widgets. Compliance dashboards (SOC 2, HIPAA, GDPR, ISO 27001). AI-generated release reports. Live commentary during execution. Doc generator. Allure import.

8 ReportsAnalyticsComplianceAllure

Built

🛡

Self-Healer

6-category AI failure diagnosis covering ALL failure types (not just selectors). Selectors 28%, Timing 25%, Runtime 15%, Test Data 15%, Visual 10%, Interactions 7%. Auto-fix with confidence scoring.

6 CategoriesAuto-FixRe-verify

Built

✨

AI Test Generator

Describe what to test in plain English. AI generates Playwright test files with interactive checklist review. Save directly to suites. 3 modes: from text, from URL, from existing test.

3 ModesChecklistSave to Suite

Built

🔒

Enterprise & Security

5-role RBAC (Owner, Admin, Lead, Tester, Viewer) with 16 resources. User management. Scheduling (cron + event + smart). Notifications (email + webhook). Audit trail. JWT sessions. REST API for CI/CD.

5 RolesSchedulingNotificationsREST API

Built

💰

Cost Observatory

LLM spend dashboard with budget tracking. Daily/weekly/monthly breakdowns by provider, model, and task type. Budget alerts at 80% threshold. Cost optimization recommendations.

Budget TrackBy ProviderAlerts

Built

📚

Knowledge Base + Flake Intelligence

RAG-powered knowledge base with hybrid search. Flake tracking with quarantine, pattern detection (timing/network/data), and trend analysis. Synthetic monitor for uptime checks.

RAG SearchFlake TrackMonitor

Built

🎨

4-Theme Design System

Sapphire (blue), Ember (warm), Violet (purple), Teal (green). Plus Jakarta Sans + JetBrains Mono. Run Config Dialog with browser/device/mode selection. Live commentary during execution.

SapphireEmberVioletTeal

Testing Dimensions

11 Testing Types. One Platform.

From web E2E to chaos resilience. Every testing dimension your team needs, built into one unified platform.

🌐

Web E2E Testing

Playwright-powered browser automation with live SSE streaming and AI failure analysis

Built

📱

Cross-Browser & Device

Chromium, Firefox, WebKit + iPhone, Pixel, iPad, Galaxy Tab via Run Config Dialog

Built

🔍

Exploratory Testing

AI-assisted guided exploration with session recording and automatic finding capture

Planned

⚡

Performance Testing

k6 and Lighthouse integration for load testing, response time profiling, and Core Web Vitals

Planned

🛡

Security Testing

OWASP scanning, vulnerability detection, security finding tracking with severity classification

Planned

👁

Visual Regression

Screenshot comparison with baselines, pixel-diff analysis, AI-powered change filtering

Planned

🔗

API Protocol Testing

REST, GraphQL, WebSocket, gRPC via MCP-connected execution engines

Planned

⚙

Accessibility Testing

WCAG compliance checking, aria validation, keyboard navigation verification

Planned

📜

Contract Testing

API contract validation against OpenAPI specs with automated drift detection

Planned

📊

Data-Driven Testing

Parameterized test execution with CSV/JSON datasets, dynamic variable injection

Planned

🔥

Chaos & Resilience Testing

Fault injection, network degradation simulation, recovery time validation. Test how your application behaves under stress and failure conditions.

Planned

AI Engine

5 Providers. 3-Tier Fallback. Zero Lock-In.

Every AI feature works with any provider. Task-aware routing picks the best model per job. Automatic failover when a provider is down.

OpenRouter

200+ models · Free tiers

→

Anthropic

Claude · Reasoning

→

OpenAI

GPT-4o · Fast

→

Ollama

Local · Private

→

Azure

Enterprise

Task-aware routing: classify-failure → fast-accurate · release-verdict → reasoning · summarize-run → fast-cheap

Self-Healer

6 Failure Categories. Not Just Selectors.

Competitors heal only broken selectors (28% of failures). Phoenix covers all 6 categories.

28%

Selectors

CSS/ID changed → upgrade to getByRole/getByTestId

25%

Timing

Element loads late → add resilient wait/polling

15%

Runtime Errors

App crash/env issue → retry after restart

15%

Test Data

Expired session/stale data → refresh data

10%

Visual Assertions

Render diff → filter irrelevant changes

Interaction Changes

Element behind overlay → add prerequisite steps

Execution Engines

Playwright Today. Vibium Tomorrow.

Two execution engines. Playwright for proven reliability. Vibium for AI-native testing.

🎭

Playwright

ACTIVE

Microsoft's battle-tested browser automation. Chromium, Firefox, WebKit. Headed and headless modes.

✓

Live SSE streaming during execution

✓

AI failure analysis on every failure

✓

Run Config: browser + device + workers + retries

✓

MCP server for AI agent browser access

✓

Auto-retry with configurable count

🔥

Vibium

PLANNED

AI-native browser engine with WebDriver BiDi. Next-generation test execution for intelligent automation.

○

AI-assisted exploratory testing

○

Visual regression with AI diff filtering

○

Page Object auto-generation from crawling

○

Security scanning (OWASP)

○

Performance profiling (k6/Lighthouse)

In Action

See It Working

Screenshots from the live platform. Scroll horizontally to explore.

📊

Command CentreDashboard

💬

AI Chat + MCPAI

🤖

Agent ProposalsAgents

🧪

Test Cases + RTMManagement

📈

Reports HubReports

💰

Cost ObservatoryAI Costs

🔌

MCP ServersMCP

✨

Test GeneratorAI

🎨

4 ThemesDesign

📋

ComplianceEnterprise

📸 Replace emoji placeholders with actual screenshots from your running app.

Under the Hood

7-Layer Architecture

Seven architectural layers. No scaffolding tools, no generators.

🌐

Frontend

27 pages, 4-theme design system, widget configurator, responsive layouts

Next.js 15React 19Tailwind v4shadcn/ui

🧠

AI Intelligence

LLM Router with 3-tier fallback, 5 providers, Cost Observatory, task-aware routing

OpenRouterAnthropicOllama

🤖

Agent Framework

6 agents, ReAct loop, event-driven orchestrator, proposal system, 3 autonomy levels

ReActEventEmitterProposals

🔌

MCP Integration

Playwright, Filesystem, GitHub via Model Context Protocol. STDIO transport, auto-connect

MCP SDKstdioglobalThis

🛡

Enterprise

5-role RBAC, scheduling, notifications, 8 reports, compliance, audit trail, user management

RBACCronSSEJWT

▶

Execution Engine

Playwright runner, live SSE streaming, AI failure analysis, auto-retry, run config dialog

Playwrightchild_process

💾

Data Layer

31 Prisma models, event bus (8 event types), audit trail, SQLite dev / PostgreSQL prod

PrismaSQLiteZustand

Tech Stack

Next.js 15React 19TypeScript 5Prisma ORMSQLite / PostgreSQLTailwind CSS v4Zustand 5shadcn/uiPlaywrightMCP SDKAuth.js (JWT)SSE StreamingRechartsZodnode-cronNodemailerbcryptjs

Roadmap

From Foundation to Frontier

8 phases. Built sequentially. Each phase adds a complete architectural layer.

Phase 0-2 · Complete

Foundation + Execution + Intelligence

Auth, landing page, dashboard, suite explorer, Playwright runner, live SSE streaming, report generation, AI failure analysis, real data integration, demo app with SauceDemo tests.

12 Sessions

Phase 3 · Complete

AI Layer

LLM Router with 3-tier fallback and 5 providers. Cost Observatory. AI Chat with 13 tools. Self-Healer (6 failure categories). Test Generator with interactive checklist. Knowledge Base with hybrid RAG search.

6 AI Features

Phase 4 · Complete

Test Management

Requirements (Jira import, AI expansion). Test Cases (steps, automation linking). Defects (7-status workflow). Test Plans (manual execution). RTM linking everything. Run Config Dialog.

5 Modules

Phase 5 · Complete

Agentic Intelligence

Event bus (8 event types). 6 autonomous agents with ReAct reasoning. Orchestrator with cooldown. Proposal system (approve/reject/auto-execute). REST API for CI/CD. Agent dashboard widgets.

6 Agents + 14 Sessions

Phase 6 · Complete

Enterprise

5-role RBAC + user management. Scheduling (cron/event/smart). Notifications (email/webhook). 8 HTML report types + analytics + compliance dashboards (SOC 2, HIPAA, GDPR, ISO). Live commentary. Doc generator. Synthetic monitor. Allure import. 4 themes. Widget configurator.

16 Sessions

Phase 7 · In Progress

Expansion & Protocol

MCP Foundation (registry, connection manager). Playwright MCP (18 capabilities). Filesystem MCP. GitHub MCP. Upcoming: Vibium engine, exploratory testing, API protocol testing (REST/GraphQL/WebSocket/gRPC), visual regression, security scanning, performance profiling, page object generator.

4 of 12 Sessions

Phase 8 · Planned

Polish & Hardening

Security hardening (input validation, rate limiting, Zod schemas). Performance optimization. Accessibility audit. Dependency management. Comprehensive E2E test coverage. Docker deployment.

Planned

Deep Reference

Every Feature, Explained

Click any section to see complete details.

🤖

6 AI Agents with ReAct Reasoning

6 agents

▼

Each agent runs a ReAct loop: observe → reason → act → observe result. Proposals created for human approval unless confidence exceeds auto-execute threshold.

Agent	Trigger	Output
Triage	test:failed	Error classification + suggested fix
Discovery	run:completed, coverage:changed	Coverage gap report + test suggestions
Strategy	run:completed, defect:created	Test strategy proposal
Learning	run:completed	Pattern insights + KB articles
Generator	coverage:changed	Generated test cases
Executor	run:completed	Automated test execution

💬

26-Tool AI Chat System

26 tools

▼

Tools load dynamically based on MCP connection status. Base: 13 database tools. Playwright MCP: +4. Filesystem MCP: +4. GitHub MCP: +5.

Category	Count	Capabilities
Database	13	Query results, runs, suites, requirements, test cases, defects, flakes, coverage, agent proposals, system stats
Browser	4	Navigate URLs, click elements, fill forms, take screenshots via Playwright MCP
Code	4	Read test files, search code, list files, get info via Filesystem MCP
GitHub	5	PR diffs, repo files, code search, commits, issues via GitHub MCP

📈

8 Report Types + Live Commentary + Allure

8+ reports

▼

Report	Content
Execution Summary	Pass/fail breakdown, duration, trends
Coverage Matrix	Requirements vs test cases, gap analysis
Defect Summary	Severity distribution, status workflow
Flake Intelligence	Flaky test patterns, quarantine, trends
AI Performance	LLM usage, cost per capability, latency
Release Readiness	AI GO/NO-GO with confidence score
Compliance Evidence	SOC 2, HIPAA, GDPR, ISO 27001 mapping
Sprint Summary	Sprint velocity, execution, defect trends

Also built: Live Commentary (AI narrates execution in real-time), Doc Generator (AI creates test documentation), Allure Import (parse external Allure reports).

🔌

MCP Protocol + 3 Servers

3 servers

▼

Model Context Protocol enables standardized tool integration. Each server runs as a child process via STDIO with timeout protection and orphan cleanup.

Server	Capabilities	Tools
Playwright	Browser automation, page inspection, screenshots	18 capabilities
Filesystem	Read test files, search code patterns	4 tools scoped to test-suites/ + src/
GitHub	PR diffs, repo files, code search, commits	5 read-only tools

Safety: 30s connection timeout, 60s tool call timeout, orphaned process cleanup. Settings UI with quick-add presets for all 4 servers (Playwright, Vibium, Filesystem, GitHub).

🛡

Enterprise: RBAC + Scheduling + Notifications

5 roles

▼

Role	Permissions
Owner	Full system control including ownership transfer
Admin	User management, settings, all artifacts, agent config
Lead	Create/edit artifacts, execute runs, approve proposals, manage schedules
Tester	Execute runs, log defects, read-only config
Viewer	Read-only access across all resources

Scheduling: Cron expressions, event-triggered (run:completed triggers next suite), smart scheduling (AI decides when to run). Notifications: Email + Webhook with rate limiting, quiet hours, severity threshold. Synthetic Monitor: Periodic uptime checks with alerting.

📚

Knowledge Base + Flake Intelligence + Cost Observatory

3 systems

▼

Knowledge Base: RAG-powered repository with hybrid search (BM25 + vectorless semantic). Stores testing patterns, failure resolutions, best practices. AI Chat queries KB before answering for context-aware responses. Categories: Playwright, Failures, Best Practices, Phoenix How-To, Custom.

Flake Intelligence: Tracks intermittently failing tests with pattern detection (timing/network/data/unknown). Quarantine system excludes flaky tests from release verdict. Flake count, total runs, first/last flaked dates, linked requirements. Dashboard widget shows active vs quarantined counts with weekly trend.

Cost Observatory: Full LLM spend dashboard. Period selector (today/week/month). Breakdown by provider, task type, model. Budget meter with warning at 80%. Recent AI calls table. Cost optimization tips.

FAQ

Common Questions

What makes Phoenix TestAI different from TestRail, Qase, or Testomat.io?▼

Phoenix is AI-native, not AI-bolted.

6 autonomous agents that triage, discover, learn, and act autonomously
MCP protocol lets agents browse pages, read code, inspect PRs
Self-Healer covers ALL 6 failure categories (competitors only fix selectors)
Self-hosted with local AI (Ollama) option
Complete lifecycle in one platform: Requirements, Cases, Plans, Defects, Execution, Reports, Analytics, Compliance

How do the AI agents work?▼

ReAct (Reason + Act) reasoning loop:

Event triggers agent (e.g., test:failed triggers Triage)
Agent observes context using specialized tools
Reasons about action via LLM
Creates proposal with confidence score + risk level
Auto-executes if confidence exceeds threshold, otherwise waits for human approval

3 autonomy levels: supervised (always wait), semi-autonomous (auto for low-risk), fully autonomous.

What AI providers does it support?▼

5 providers with 3-tier fallback: OpenRouter (200+ models, free tiers), Anthropic (Claude), OpenAI (GPT-4o), Ollama (local, private), Azure (enterprise). The LLM Router selects the best model per task based on capability, cost, and availability.

What is MCP and why does it matter?▼

Model Context Protocol is an open standard for AI tool integration. Instead of hardcoded integrations, Phoenix uses MCP to connect to Playwright (browse pages), Filesystem (read code), and GitHub (inspect PRs). Any future MCP server can be added without code changes. This means Phoenix AI capabilities grow as the MCP ecosystem grows.

What testing types are supported?▼

11 testing dimensions: Web E2E (built), Cross-Browser/Device (built via Run Config), and 9 planned types: Exploratory, Performance (k6/Lighthouse), Security (OWASP), Visual Regression, API Protocol (REST/GraphQL/WebSocket/gRPC), Accessibility, Contract, Data-Driven, and Chaos/Resilience testing.

Is it production-ready?▼

Yes. 80 API routes with RBAC. 31 Prisma models. 5-role access control. Event-driven architecture with audit trail. JWT sessions. 4 compliance frameworks. 78 E2E + 30 unit + 36 integration tests.

What hardware do I need?▼

Minimal: Any machine with Node.js 18+, 1GB RAM. With Ollama: GPU with 4GB+ VRAM. With cloud AI: No GPU needed. Database: SQLite (default) or PostgreSQL for production.

The Quality Intelligence Platform

✕ Typical Test Management Tools

✓ Phoenix TestAI