AI-Native · Self-Hosted · MCP-Ready

The Quality Intelligence Platform

Phoenix TestAI
From Every Failure, Intelligence Rises.

6 autonomous AI agents. 11 testing types. MCP protocol integration. Playwright + Vibium execution. Complete test lifecycle management. Self-hosted, zero vendor lock-in.

0
API Routes
0
Data Models
0
AI Tools
0
AI Agents
0
Report Types
0
Testing Types

Tests fail. That is not the problem.
The problem is that most tools just report the failure and move on.

What if your platform could triage, learn, adapt,
and rise smarter after every run?

Phoenix does.

Platform Capabilities
Everything a Quality Team Needs.
From test execution to AI-driven insights. Every capability solves a real problem. Built features ship today. Pipeline items are coming next.
Built
📊
Command Centre
13 configurable widgets: release verdict, execution trends, module health, flake tracking, agent status, coverage gaps. Widget configurator for layout customization. All data live via SSE.
AI Verdict14 WidgetsConfigurator
Built
💬
26-Tool AI Chat
Context-aware AI assistant. 13 DB tools + 4 browser tools (Playwright MCP) + 4 code tools (Filesystem MCP) + 5 repo tools (GitHub MCP). Tools load dynamically based on MCP connections.
13 DB Tools4 Browser5 GitHub4 Code
Built
🤖
6 Autonomous Agents
Triage, Discovery, Strategy, Learning, Generator, Executor. Event-driven with ReAct reasoning loop, confidence scoring, 3 autonomy levels, and human-in-loop approval via Proposals page.
ReAct LoopEvent BusProposals UI
Built
📋
Test Management
Requirements with Jira import + AI expansion. Test Cases with steps. Test Plans with manual execution. Defects with 7-status workflow. Requirements Traceability Matrix linking everything.
RequirementsTest CasesDefectsRTM
Built
🔌
MCP Integration
Model Context Protocol with Playwright, Filesystem, and GitHub servers. Server registry with connect/disconnect. Quick-add presets. Agents and chat access all MCP tools through standard interface.
PlaywrightFilesystemGitHubRegistry
Built
📈
Reports & Analytics
8 HTML report types. 8 analytics widgets. Compliance dashboards (SOC 2, HIPAA, GDPR, ISO 27001). AI-generated release reports. Live commentary during execution. Doc generator. Allure import.
8 ReportsAnalyticsComplianceAllure
Built
🛡
Self-Healer
6-category AI failure diagnosis covering ALL failure types (not just selectors). Selectors 28%, Timing 25%, Runtime 15%, Test Data 15%, Visual 10%, Interactions 7%. Auto-fix with confidence scoring.
6 CategoriesAuto-FixRe-verify
Built
AI Test Generator
Describe what to test in plain English. AI generates Playwright test files with interactive checklist review. Save directly to suites. 3 modes: from text, from URL, from existing test.
3 ModesChecklistSave to Suite
Built
🔒
Enterprise & Security
5-role RBAC (Owner, Admin, Lead, Tester, Viewer) with 16 resources. User management. Scheduling (cron + event + smart). Notifications (email + webhook). Audit trail. JWT sessions. REST API for CI/CD.
5 RolesSchedulingNotificationsREST API
Built
💰
Cost Observatory
LLM spend dashboard with budget tracking. Daily/weekly/monthly breakdowns by provider, model, and task type. Budget alerts at 80% threshold. Cost optimization recommendations.
Budget TrackBy ProviderAlerts
Built
📚
Knowledge Base + Flake Intelligence
RAG-powered knowledge base with hybrid search. Flake tracking with quarantine, pattern detection (timing/network/data), and trend analysis. Synthetic monitor for uptime checks.
RAG SearchFlake TrackMonitor
Built
🎨
4-Theme Design System
Sapphire (blue), Ember (warm), Violet (purple), Teal (green). Plus Jakarta Sans + JetBrains Mono. Run Config Dialog with browser/device/mode selection. Live commentary during execution.
SapphireEmberVioletTeal
Testing Dimensions
11 Testing Types. One Platform.
From web E2E to chaos resilience. Every testing dimension your team needs, built into one unified platform.
🌐
Web E2E Testing
Playwright-powered browser automation with live SSE streaming and AI failure analysis
Built
📱
Cross-Browser & Device
Chromium, Firefox, WebKit + iPhone, Pixel, iPad, Galaxy Tab via Run Config Dialog
Built
🔍
Exploratory Testing
AI-assisted guided exploration with session recording and automatic finding capture
Planned
Performance Testing
k6 and Lighthouse integration for load testing, response time profiling, and Core Web Vitals
Planned
🛡
Security Testing
OWASP scanning, vulnerability detection, security finding tracking with severity classification
Planned
👁
Visual Regression
Screenshot comparison with baselines, pixel-diff analysis, AI-powered change filtering
Planned
🔗
API Protocol Testing
REST, GraphQL, WebSocket, gRPC via MCP-connected execution engines
Planned
Accessibility Testing
WCAG compliance checking, aria validation, keyboard navigation verification
Planned
📜
Contract Testing
API contract validation against OpenAPI specs with automated drift detection
Planned
📊
Data-Driven Testing
Parameterized test execution with CSV/JSON datasets, dynamic variable injection
Planned
🔥
Chaos & Resilience Testing
Fault injection, network degradation simulation, recovery time validation. Test how your application behaves under stress and failure conditions.
Planned
AI Engine
5 Providers. 3-Tier Fallback. Zero Lock-In.
Every AI feature works with any provider. Task-aware routing picks the best model per job. Automatic failover when a provider is down.
1
OpenRouter
200+ models · Free tiers
2
Anthropic
Claude · Reasoning
3
OpenAI
GPT-4o · Fast
4
Ollama
Local · Private
5
Azure
Enterprise

Task-aware routing: classify-failure → fast-accurate · release-verdict → reasoning · summarize-run → fast-cheap

Self-Healer
6 Failure Categories. Not Just Selectors.
Competitors heal only broken selectors (28% of failures). Phoenix covers all 6 categories.
28%
Selectors
CSS/ID changed → upgrade to getByRole/getByTestId
25%
Timing
Element loads late → add resilient wait/polling
15%
Runtime Errors
App crash/env issue → retry after restart
15%
Test Data
Expired session/stale data → refresh data
10%
Visual Assertions
Render diff → filter irrelevant changes
7%
Interaction Changes
Element behind overlay → add prerequisite steps
Execution Engines
Playwright Today. Vibium Tomorrow.
Two execution engines. Playwright for proven reliability. Vibium for AI-native testing.
🎭
Playwright
ACTIVE
Microsoft's battle-tested browser automation. Chromium, Firefox, WebKit. Headed and headless modes.
Live SSE streaming during execution
AI failure analysis on every failure
Run Config: browser + device + workers + retries
MCP server for AI agent browser access
Auto-retry with configurable count
🔥
Vibium
PLANNED
AI-native browser engine with WebDriver BiDi. Next-generation test execution for intelligent automation.
AI-assisted exploratory testing
Visual regression with AI diff filtering
Page Object auto-generation from crawling
Security scanning (OWASP)
Performance profiling (k6/Lighthouse)
In Action
See It Working
Screenshots from the live platform. Scroll horizontally to explore.
📊
Command CentreDashboard
💬
AI Chat + MCPAI
🤖
Agent ProposalsAgents
🧪
Test Cases + RTMManagement
📈
Reports HubReports
💰
Cost ObservatoryAI Costs
🔌
MCP ServersMCP
Test GeneratorAI
🎨
4 ThemesDesign
📋
ComplianceEnterprise

📸 Replace emoji placeholders with actual screenshots from your running app.

Self-Hosted
Your Server. Your Data. Zero Cost.

✕ Typical Test Management Tools

$30-$79 per user per month
Your test data on their cloud
AI features cost extra or absent
No autonomous agents
No MCP protocol support
No local AI option
Limited test execution capabilities

✓ Phoenix TestAI

$0 forever. No per-seat cost.
Single SQLite file on your server
5-provider AI with free tiers
6 autonomous AI agents
MCP protocol integration
Ollama for 100% private AI
Playwright + Vibium execution
Under the Hood
7-Layer Architecture
Seven architectural layers. No scaffolding tools, no generators.
🌐
Frontend
27 pages, 4-theme design system, widget configurator, responsive layouts
Next.js 15React 19Tailwind v4shadcn/ui
🧠
AI Intelligence
LLM Router with 3-tier fallback, 5 providers, Cost Observatory, task-aware routing
OpenRouterAnthropicOllama
🤖
Agent Framework
6 agents, ReAct loop, event-driven orchestrator, proposal system, 3 autonomy levels
ReActEventEmitterProposals
🔌
MCP Integration
Playwright, Filesystem, GitHub via Model Context Protocol. STDIO transport, auto-connect
MCP SDKstdioglobalThis
🛡
Enterprise
5-role RBAC, scheduling, notifications, 8 reports, compliance, audit trail, user management
RBACCronSSEJWT
Execution Engine
Playwright runner, live SSE streaming, AI failure analysis, auto-retry, run config dialog
Playwrightchild_process
💾
Data Layer
31 Prisma models, event bus (8 event types), audit trail, SQLite dev / PostgreSQL prod
PrismaSQLiteZustand
Tech Stack
Next.js 15React 19TypeScript 5Prisma ORMSQLite / PostgreSQLTailwind CSS v4Zustand 5shadcn/uiPlaywrightMCP SDKAuth.js (JWT)SSE StreamingRechartsZodnode-cronNodemailerbcryptjs
Roadmap
From Foundation to Frontier
8 phases. Built sequentially. Each phase adds a complete architectural layer.
Phase 0-2 · Complete
Foundation + Execution + Intelligence
Auth, landing page, dashboard, suite explorer, Playwright runner, live SSE streaming, report generation, AI failure analysis, real data integration, demo app with SauceDemo tests.
12 Sessions
Phase 3 · Complete
AI Layer
LLM Router with 3-tier fallback and 5 providers. Cost Observatory. AI Chat with 13 tools. Self-Healer (6 failure categories). Test Generator with interactive checklist. Knowledge Base with hybrid RAG search.
6 AI Features
Phase 4 · Complete
Test Management
Requirements (Jira import, AI expansion). Test Cases (steps, automation linking). Defects (7-status workflow). Test Plans (manual execution). RTM linking everything. Run Config Dialog.
5 Modules
Phase 5 · Complete
Agentic Intelligence
Event bus (8 event types). 6 autonomous agents with ReAct reasoning. Orchestrator with cooldown. Proposal system (approve/reject/auto-execute). REST API for CI/CD. Agent dashboard widgets.
6 Agents + 14 Sessions
Phase 6 · Complete
Enterprise
5-role RBAC + user management. Scheduling (cron/event/smart). Notifications (email/webhook). 8 HTML report types + analytics + compliance dashboards (SOC 2, HIPAA, GDPR, ISO). Live commentary. Doc generator. Synthetic monitor. Allure import. 4 themes. Widget configurator.
16 Sessions
Phase 7 · In Progress
Expansion & Protocol
MCP Foundation (registry, connection manager). Playwright MCP (18 capabilities). Filesystem MCP. GitHub MCP. Upcoming: Vibium engine, exploratory testing, API protocol testing (REST/GraphQL/WebSocket/gRPC), visual regression, security scanning, performance profiling, page object generator.
4 of 12 Sessions
Phase 8 · Planned
Polish & Hardening
Security hardening (input validation, rate limiting, Zod schemas). Performance optimization. Accessibility audit. Dependency management. Comprehensive E2E test coverage. Docker deployment.
Planned
Deep Reference
Every Feature, Explained
Click any section to see complete details.
🤖
6 AI Agents with ReAct Reasoning
6 agents

Each agent runs a ReAct loop: observe → reason → act → observe result. Proposals created for human approval unless confidence exceeds auto-execute threshold.

AgentTriggerOutput
Triagetest:failedError classification + suggested fix
Discoveryrun:completed, coverage:changedCoverage gap report + test suggestions
Strategyrun:completed, defect:createdTest strategy proposal
Learningrun:completedPattern insights + KB articles
Generatorcoverage:changedGenerated test cases
Executorrun:completedAutomated test execution
💬
26-Tool AI Chat System
26 tools

Tools load dynamically based on MCP connection status. Base: 13 database tools. Playwright MCP: +4. Filesystem MCP: +4. GitHub MCP: +5.

CategoryCountCapabilities
Database13Query results, runs, suites, requirements, test cases, defects, flakes, coverage, agent proposals, system stats
Browser4Navigate URLs, click elements, fill forms, take screenshots via Playwright MCP
Code4Read test files, search code, list files, get info via Filesystem MCP
GitHub5PR diffs, repo files, code search, commits, issues via GitHub MCP
📈
8 Report Types + Live Commentary + Allure
8+ reports
ReportContent
Execution SummaryPass/fail breakdown, duration, trends
Coverage MatrixRequirements vs test cases, gap analysis
Defect SummarySeverity distribution, status workflow
Flake IntelligenceFlaky test patterns, quarantine, trends
AI PerformanceLLM usage, cost per capability, latency
Release ReadinessAI GO/NO-GO with confidence score
Compliance EvidenceSOC 2, HIPAA, GDPR, ISO 27001 mapping
Sprint SummarySprint velocity, execution, defect trends

Also built: Live Commentary (AI narrates execution in real-time), Doc Generator (AI creates test documentation), Allure Import (parse external Allure reports).

🔌
MCP Protocol + 3 Servers
3 servers

Model Context Protocol enables standardized tool integration. Each server runs as a child process via STDIO with timeout protection and orphan cleanup.

ServerCapabilitiesTools
PlaywrightBrowser automation, page inspection, screenshots18 capabilities
FilesystemRead test files, search code patterns4 tools scoped to test-suites/ + src/
GitHubPR diffs, repo files, code search, commits5 read-only tools

Safety: 30s connection timeout, 60s tool call timeout, orphaned process cleanup. Settings UI with quick-add presets for all 4 servers (Playwright, Vibium, Filesystem, GitHub).

🛡
Enterprise: RBAC + Scheduling + Notifications
5 roles
RolePermissions
OwnerFull system control including ownership transfer
AdminUser management, settings, all artifacts, agent config
LeadCreate/edit artifacts, execute runs, approve proposals, manage schedules
TesterExecute runs, log defects, read-only config
ViewerRead-only access across all resources

Scheduling: Cron expressions, event-triggered (run:completed triggers next suite), smart scheduling (AI decides when to run). Notifications: Email + Webhook with rate limiting, quiet hours, severity threshold. Synthetic Monitor: Periodic uptime checks with alerting.

📚
Knowledge Base + Flake Intelligence + Cost Observatory
3 systems

Knowledge Base: RAG-powered repository with hybrid search (BM25 + vectorless semantic). Stores testing patterns, failure resolutions, best practices. AI Chat queries KB before answering for context-aware responses. Categories: Playwright, Failures, Best Practices, Phoenix How-To, Custom.

Flake Intelligence: Tracks intermittently failing tests with pattern detection (timing/network/data/unknown). Quarantine system excludes flaky tests from release verdict. Flake count, total runs, first/last flaked dates, linked requirements. Dashboard widget shows active vs quarantined counts with weekly trend.

Cost Observatory: Full LLM spend dashboard. Period selector (today/week/month). Breakdown by provider, task type, model. Budget meter with warning at 80%. Recent AI calls table. Cost optimization tips.

FAQ
Common Questions
What makes Phoenix TestAI different from TestRail, Qase, or Testomat.io?
Phoenix is AI-native, not AI-bolted.
  • 6 autonomous agents that triage, discover, learn, and act autonomously
  • MCP protocol lets agents browse pages, read code, inspect PRs
  • Self-Healer covers ALL 6 failure categories (competitors only fix selectors)
  • Self-hosted with local AI (Ollama) option
  • Complete lifecycle in one platform: Requirements, Cases, Plans, Defects, Execution, Reports, Analytics, Compliance
How do the AI agents work?
ReAct (Reason + Act) reasoning loop:
  • Event triggers agent (e.g., test:failed triggers Triage)
  • Agent observes context using specialized tools
  • Reasons about action via LLM
  • Creates proposal with confidence score + risk level
  • Auto-executes if confidence exceeds threshold, otherwise waits for human approval
3 autonomy levels: supervised (always wait), semi-autonomous (auto for low-risk), fully autonomous.
What AI providers does it support?
5 providers with 3-tier fallback: OpenRouter (200+ models, free tiers), Anthropic (Claude), OpenAI (GPT-4o), Ollama (local, private), Azure (enterprise). The LLM Router selects the best model per task based on capability, cost, and availability.
What is MCP and why does it matter?
Model Context Protocol is an open standard for AI tool integration. Instead of hardcoded integrations, Phoenix uses MCP to connect to Playwright (browse pages), Filesystem (read code), and GitHub (inspect PRs). Any future MCP server can be added without code changes. This means Phoenix AI capabilities grow as the MCP ecosystem grows.
What testing types are supported?
11 testing dimensions: Web E2E (built), Cross-Browser/Device (built via Run Config), and 9 planned types: Exploratory, Performance (k6/Lighthouse), Security (OWASP), Visual Regression, API Protocol (REST/GraphQL/WebSocket/gRPC), Accessibility, Contract, Data-Driven, and Chaos/Resilience testing.
Is it production-ready?
Yes. 80 API routes with RBAC. 31 Prisma models. 5-role access control. Event-driven architecture with audit trail. JWT sessions. 4 compliance frameworks. 78 E2E + 30 unit + 36 integration tests.
What hardware do I need?
Minimal: Any machine with Node.js 18+, 1GB RAM. With Ollama: GPU with 4GB+ VRAM. With cloud AI: No GPU needed. Database: SQLite (default) or PostgreSQL for production.
KP
Krishna Praveen Manchala
Senior QA Manager · 17+ Years in Quality Engineering

Built Phoenix TestAI and API Qortex as self-initiated projects to solve real problems in test automation and quality management. Not a developer by title — a QA Manager who saw the gaps in existing tools and decided to build something better. From architecture design to AI agent implementation, every line of code was written to prove that quality engineers can build the tools they need.

A
API Qortex
P
Phoenix TestAI