sequence

%% RAG 對話流程 Sequence 圖
sequenceDiagram
autonumber
actor User as 使用者 (Client)
participant AC as Agent Client
participant GW as API Gateway
participant Auth as AuthService
participant RAG as RAG Pipeline
participant Embed as Embedding Service
participant Ret as Retrieval Service
participant RR as Reranker
participant Skill as Skill Engine
participant Ctx as Context Manager
participant LLM as Inference Service (llama.cpp)
participant DB as PostgreSQL + pgvector

User->>AC: 輸入查詢 prompt
AC->>AC: 檢查本地快取
alt 本地快取命中
AC-->>User: 回傳快取結果
else 需遠端處理
AC->>GW: POST /api/v1/query {query, conversation_id, options}

%% 認證
GW->>Auth: 驗證 Bearer Token
Auth-->>GW: ✅ 驗證通過 (user_id, role)

%% 載入對話歷史
GW->>RAG: 處理查詢
RAG->>DB: 取得對話歷史 (conversation_id, 最近 N 輪)
DB-->>RAG: 歷史訊息

%% Embedding
RAG->>Embed: 生成查詢 embedding
Embed-->>RAG: query_vector [768d]

%% 向量檢索
RAG->>Ret: 向量搜尋 (query_vector, top_k, filters)
Ret->>DB: pgvector cosine similarity + metadata filter
DB-->>Ret: 候選 chunks (top_k)
Ret-->>RAG: List[RetrievalResult]

%% Reranking (可選)
opt options.use_reranking = true
RAG->>RR: rerank(query, chunks)
RR-->>RAG: 重排序後的 chunks (top_n)
end

%% Skill 匹配
RAG->>Skill: match(query, context)
alt Skill 匹配成功 (confidence > 0.8)
Skill->>Skill: 執行 Skill
Skill-->>RAG: skill_output
else 無匹配 Skill
Skill-->>RAG: None
end

%% Context 組裝
RAG->>Ctx: fit_context(system_prompt, history, chunks, reserved=1024)
Ctx->>Ctx: 計算 total tokens

alt tokens ≤ context_limit
Ctx-->>RAG: 完整 context messages
else tokens > context_limit
Ctx->>Ctx: 策略 1: Top-K 篩選
alt 仍超限
Ctx->>LLM: 摘要化 chunks (summarize)
LLM-->>Ctx: condensed summary
Ctx->>Ctx: 策略 2: 使用摘要替代原始 chunks
alt 仍超限
Ctx->>Ctx: 策略 3: 截斷早期對話歷史
end
end
Ctx-->>RAG: 調整後的 context messages
end

%% LLM 推理
RAG->>LLM: generate(InferenceRequest)
alt 推理成功
LLM-->>RAG: InferenceResponse (answer)
else 推理異常 (context 超限/OOM)
LLM-->>RAG: Error
RAG->>Ctx: 減少 context 重試
Ctx-->>RAG: 更精簡的 context
RAG->>LLM: generate (精簡版)
LLM-->>RAG: InferenceResponse (降級答案)
end

%% Answer Verification (可選)
opt 啟用答案驗證
RAG->>RAG: 比對答案與檢索片段
RAG->>RAG: 計算置信度分數
alt 置信度 < 閾值
RAG->>RAG: 標記為「不確定，建議人工審核」
end
end

%% 儲存與回應
RAG->>DB: 儲存 user message + assistant message
RAG->>DB: 寫入 audit_log

RAG-->>GW: RAGResponse
GW-->>AC: HTTP 200 {answer, sources, confidence, usage}
AC->>AC: 寫入本地快取
AC-->>User: 顯示回答 + 引用來源
end

sequence_rag

Comments...

Add Comment...

Latest Posts

free deploy

nanobot優化

20260330

Categories