Architecture¶
How AI TestPilot X is built — LangGraph orchestration, agent design, storage, and deployment.
System diagram¶
graph LR
subgraph CLI["CLI / Python API"]
A[testpilot run]
B[Python API]
end
subgraph Orchestrator["LangGraph Orchestrator"]
C[GlobalState]
D[AgentRegistry]
E[CheckpointStore]
F[HITL Gate]
end
subgraph Agents["10 AI Agents"]
G[RequirementAgent]
H[TestCaseAgent]
I[VerificationAgent]
J[SeleniumAgent]
K[APIAgent]
L[ExecutionAgent]
M[BugAgent]
N[HealingAgent]
O[ReportAgent]
end
subgraph Storage["Storage Layer"]
P[(SQLite)]
Q[(ChromaDB)]
R[Logs / Artifacts]
end
subgraph LLM["LLM Layer"]
S[Gemini 2.5 Flash]
end
A --> C
B --> C
C --> D
D --> G --> H --> I --> J
D --> K
J --> L
K --> L
L --> F --> M --> N --> O
O --> C
G & H & I & J & K & M & N & O --> S
H & M --> Q
L & O --> P
L --> R LangGraph orchestration¶
The pipeline is a directed acyclic graph with one conditional branch (HITL gate):
# orchestrator.py
graph = StateGraph(GlobalState)
graph.add_node("requirements", requirement_agent.run)
graph.add_node("testcases", testcase_agent.run)
graph.add_node("verification", verification_agent.run)
graph.add_node("selenium", selenium_agent.run)
graph.add_node("api", api_agent.run)
graph.add_node("execution", execution_agent.run)
graph.add_node("bugs", bug_agent.run)
graph.add_node("healing", healing_agent.run)
graph.add_node("report", report_agent.run)
graph.add_edge("requirements", "testcases")
graph.add_edge("testcases", "verification")
graph.add_edge("verification", "selenium")
graph.add_edge("verification", "api")
graph.add_edge(["selenium", "api"], "execution")
graph.add_conditional_edges("execution", hitl_gate, {
True: "bugs",
False: END, # HITL rejected
})
graph.add_edge("bugs", "healing")
graph.add_edge("healing", "report")
Checkpointing — every node's output is persisted to SQLite before the next node runs. A failed pipeline can be resumed from the last checkpoint.
RAG engine (ChromaDB)¶
Four collections in ChromaDB, all using all-MiniLM-L6-v2 embeddings:
| Collection | What's stored | Used by |
|---|---|---|
test_cases | Historical test cases + outcomes | TestCaseAgent (similar tests) |
bugs | Bug reports + root causes + fixes | BugAgent (RAG correlation) |
requirements | Parsed requirement modules | RequirementAgent (context) |
knowledge_base | Ingested docs (PDFs, Markdown) | All agents |
# core/rag_engine.py
from chromadb import EmbeddingFunction
from sentence_transformers import SentenceTransformer
class RAGEngine:
def query(self, collection: str, query: str, n_results: int = 5):
...
def add(self, collection: str, documents: list[str], metadata: list[dict]):
...
Storage¶
SQLite (via SQLAlchemy, 6 tables):
| Table | Contents |
|---|---|
requirements | Parsed requirement modules per session |
testcases | Generated test cases |
executions | Test execution results |
bugs | Bug reports from BugAgent |
reports | Final GO/NO GO reports |
trust_domains | HITL trust domain registry |
SQLAlchemy models are in storage/models.py. Swap to PostgreSQL by changing DB_URL — no code changes.
Self-healing locator hierarchy¶
When a Selenium test fails with a NoSuchElementException, HealingAgent walks this 7-level fallback:
1. ID (#login-btn)
2. Name (name="username")
3. data-testid (data-testid="submit")
4. data-qa (data-qa="checkout-btn")
5. CSS selector (.login-form > button[type=submit])
6. XPath (//button[contains(text(),'Log in')])
7. AI-generated (Gemini generates a new locator from page DOM snapshot)
The healed locator is saved back to the test script and stored in ChromaDB for future sessions.
Tech stack¶
| Layer | Technology | Version |
|---|---|---|
| LLM | Google Gemini 2.5 Flash | google-generativeai >= 0.8 |
| Orchestration | LangGraph | 0.3.* |
| RAG | ChromaDB + all-MiniLM-L6-v2 | chromadb >= 0.5 |
| CLI | Typer + Rich | typer >= 0.12 |
| UI | Streamlit + Plotly + streamlit-agraph | streamlit >= 1.40 |
| Browser | Selenium 4 + webdriver-manager | selenium >= 4.0 |
| HTTP tests | httpx | httpx >= 0.27 |
| Storage | SQLAlchemy + SQLite | sqlalchemy >= 2.0 |
| Validation | Pydantic v2 | pydantic >= 2.0 |
| Observability | LangSmith + Loguru |
Deployment modes¶
| Mode | Use case | Notes |
|---|---|---|
| Local dev | Full dev loop with real browser | EXECUTION_MODE=LOCAL |
| Streamlit Cloud | Demo, CI preview | EXECUTION_MODE=MOCK, no Chrome |
| Docker + Selenium Grid | Parallel real browser CI | EXECUTION_MODE=GRID |