6 autonomous AI agents. 11 testing types. MCP protocol integration. Playwright + Vibium execution. Complete test lifecycle management. Self-hosted, zero vendor lock-in.
Tests fail. That is not the problem.
The problem is that most tools just report the failure and move on.
What if your platform could triage, learn, adapt,
and rise smarter after every run?
Phoenix does.
Task-aware routing: classify-failure → fast-accurate · release-verdict → reasoning · summarize-run → fast-cheap
📸 Replace emoji placeholders with actual screenshots from your running app.
Each agent runs a ReAct loop: observe → reason → act → observe result. Proposals created for human approval unless confidence exceeds auto-execute threshold.
| Agent | Trigger | Output |
|---|---|---|
| Triage | test:failed | Error classification + suggested fix |
| Discovery | run:completed, coverage:changed | Coverage gap report + test suggestions |
| Strategy | run:completed, defect:created | Test strategy proposal |
| Learning | run:completed | Pattern insights + KB articles |
| Generator | coverage:changed | Generated test cases |
| Executor | run:completed | Automated test execution |
Tools load dynamically based on MCP connection status. Base: 13 database tools. Playwright MCP: +4. Filesystem MCP: +4. GitHub MCP: +5.
| Category | Count | Capabilities |
|---|---|---|
| Database | 13 | Query results, runs, suites, requirements, test cases, defects, flakes, coverage, agent proposals, system stats |
| Browser | 4 | Navigate URLs, click elements, fill forms, take screenshots via Playwright MCP |
| Code | 4 | Read test files, search code, list files, get info via Filesystem MCP |
| GitHub | 5 | PR diffs, repo files, code search, commits, issues via GitHub MCP |
| Report | Content |
|---|---|
| Execution Summary | Pass/fail breakdown, duration, trends |
| Coverage Matrix | Requirements vs test cases, gap analysis |
| Defect Summary | Severity distribution, status workflow |
| Flake Intelligence | Flaky test patterns, quarantine, trends |
| AI Performance | LLM usage, cost per capability, latency |
| Release Readiness | AI GO/NO-GO with confidence score |
| Compliance Evidence | SOC 2, HIPAA, GDPR, ISO 27001 mapping |
| Sprint Summary | Sprint velocity, execution, defect trends |
Also built: Live Commentary (AI narrates execution in real-time), Doc Generator (AI creates test documentation), Allure Import (parse external Allure reports).
Model Context Protocol enables standardized tool integration. Each server runs as a child process via STDIO with timeout protection and orphan cleanup.
| Server | Capabilities | Tools |
|---|---|---|
| Playwright | Browser automation, page inspection, screenshots | 18 capabilities |
| Filesystem | Read test files, search code patterns | 4 tools scoped to test-suites/ + src/ |
| GitHub | PR diffs, repo files, code search, commits | 5 read-only tools |
Safety: 30s connection timeout, 60s tool call timeout, orphaned process cleanup. Settings UI with quick-add presets for all 4 servers (Playwright, Vibium, Filesystem, GitHub).
| Role | Permissions |
|---|---|
| Owner | Full system control including ownership transfer |
| Admin | User management, settings, all artifacts, agent config |
| Lead | Create/edit artifacts, execute runs, approve proposals, manage schedules |
| Tester | Execute runs, log defects, read-only config |
| Viewer | Read-only access across all resources |
Scheduling: Cron expressions, event-triggered (run:completed triggers next suite), smart scheduling (AI decides when to run). Notifications: Email + Webhook with rate limiting, quiet hours, severity threshold. Synthetic Monitor: Periodic uptime checks with alerting.
Knowledge Base: RAG-powered repository with hybrid search (BM25 + vectorless semantic). Stores testing patterns, failure resolutions, best practices. AI Chat queries KB before answering for context-aware responses. Categories: Playwright, Failures, Best Practices, Phoenix How-To, Custom.
Flake Intelligence: Tracks intermittently failing tests with pattern detection (timing/network/data/unknown). Quarantine system excludes flaky tests from release verdict. Flake count, total runs, first/last flaked dates, linked requirements. Dashboard widget shows active vs quarantined counts with weekly trend.
Cost Observatory: Full LLM spend dashboard. Period selector (today/week/month). Breakdown by provider, task type, model. Budget meter with warning at 80%. Recent AI calls table. Cost optimization tips.
Built Phoenix TestAI and API Qortex as self-initiated projects to solve real problems in test automation and quality management. Not a developer by title — a QA Manager who saw the gaps in existing tools and decided to build something better. From architecture design to AI agent implementation, every line of code was written to prove that quality engineers can build the tools they need.