openclawmemoryhybrid-searchbm25mmralgorithm
ๆทฑๅ ฅๅๆ Hybrid Search (ๅ้ + BM25) ็ๅฎ็ฐๅ็ๅไผๅ็ญ็ฅ
้ฎ้ข่ๆฏ
็บฏๅ้ๆ็ดข็ๅฑ้
flowchart LR Q["Query: API authentication"] --> V[ๅ้ๆจกๅ] V -->|ๅตๅ ฅ| E[Query Vector] E -->|ไฝๅผฆ็ธไผผๅบฆ| D1["Doc: API design doc"] E -->|ไฝๅผฆ็ธไผผๅบฆ| D2["Doc: OAuth guide"] E -->|ไฝๅผฆ็ธไผผๅบฆ| D3["Doc: ็จๆทๆๅ"] D1 -->|0.85| R1[ๅน้ ] D2 -->|0.82| R2[ๅน้ ] D3 -->|0.45| R3[ไธๅน้ ]
้ฎ้ข:
- ๆ ๆณ็ฒพ็กฎๅน้ ๅ ณ้ฎ่ฏ๏ผๅฆ โOAuthโ vs โauthenticationโ๏ผ
- ๅฏ่ฝ้ๆผๅ ๅซ็ฒพ็กฎๆฏ่ฏญไฝ่ฏญไน็ฅๆๅทฎๅผ็ๆๆกฃ
็บฏ BM25 ็ๅฑ้
flowchart LR Q["Query: how to implement authentication"] --> B[BM25] B -->|ๅน้ | D1["Doc: implementation guide"] B -->|ๅน้ | D2["Doc: authentication API"] B -->|ไธๅน้ | D3["Doc: auth ๆไฝณๅฎ่ทต"] D1 -->|0.9| R1[้ซๆๅ] D2 -->|0.8| R2[้ซๆๅ] D3 -->|0.2| R3[ไฝๆๅ]
้ฎ้ข:
- โauthenticationโ ๅ โauthโ ่ขซ่งไธบไธๅ่ฏ
- ๆ ๆณ็่งฃ่ฏญไน็ธไผผๆง
่งฃๅณๆนๆก๏ผๆททๅๆ็ดข
ๆถๆ่ฎพ่ฎก
flowchart TB Q[Query] --> E[Embed Query] subgraph "ๅ่ทฏๅฌๅ" E --> V[Vector Search] Q --> K[Keyword Search] end subgraph "็ปๆ่ๅ" V -->|scores| M[Merge] K -->|scores| M M --> F[Fusion] end subgraph "ๅๅค็" F --> T[Temporal Decay] T --> R[MMR Rerank] end R --> Result
ๆ ธๅฟๅฎ็ฐ
ๆไปถ: src/memory/hybrid.ts
export function mergeHybridResults(
vectorResults: VectorResult[],
keywordResults: KeywordResult[],
vectorWeight: number = 0.7, // ๅฏ้
็ฝฎ
textWeight: number = 0.3 // ๅฏ้
็ฝฎ
): HybridResult[] {
const merged = new Map<string, HybridResult>();
// 1. ๅฝไธๅๅ้ๅๆฐๅฐ 0-1
const maxVectorScore = Math.max(...vectorResults.map(r => r.score));
for (const result of vectorResults) {
merged.set(result.id, {
...result,
vectorScore: result.score / maxVectorScore,
textScore: 0,
finalScore: (result.score / maxVectorScore) * vectorWeight
});
}
// 2. ๅฝไธๅ BM25 ๅๆฐ
for (const result of keywordResults) {
// BM25 rank ่ถไฝ่ถๅฅฝ๏ผ่ฝฌๆขไธบ 0-1 ๅๆฐ
// rank 0 = 1.0, rank 10 = 0.09
const textScore = 1 / (1 + Math.max(0, result.rank));
if (merged.has(result.id)) {
// ๅทฒๅจๅ้็ปๆไธญ๏ผ็ดฏๅ ๅๆฐ
const existing = merged.get(result.id)!;
existing.textScore = textScore;
existing.finalScore += textScore * textWeight;
} else {
// ไป
ๅ
ณ้ฎ่ฏๅน้
merged.set(result.id, {
...result,
vectorScore: 0,
textScore,
finalScore: textScore * textWeight
});
}
}
// 3. ๆๆ็ปๅๆฐๆๅบ
return Array.from(merged.values())
.sort((a, b) => b.finalScore - a.finalScore);
}่ๅๅ ฌๅผ:
finalScore = vectorWeight * normalizedVectorScore +
textWeight * normalizedBM25Score
้ป่ฎค: 0.7 * vector + 0.3 * text
ๅ้ๆฑ ๆฉๅฑ็ญ็ฅ
// manager-search.ts
async search(query: string, options: SearchOptions) {
const candidateMultiplier = 4; // ๅฏ้
็ฝฎ
// ๅฌๅ 4 ๅ็ปๆ
const vectorResults = await this.searchVector(
queryVector,
options.maxResults * candidateMultiplier // 6 * 4 = 24
);
const keywordResults = await this.searchKeyword(query);
// ไน็ปๅๆ ทๆฐ้็ๅ้
}ไธบไปไน้่ฆ 4x ๆฉๅฑ๏ผ
- ็ป่ๅ็ฎๆณๆดๅค้ๆฉ็ฉบ้ด
- ่ฎฉ MMR ๅคๆ ทๆง้ๆๆ่ถณๅคๅ้
- ๅนณ่กกๆง่ฝๅๅฌๅ็
MMR ๅคๆ ทๆง้ๆ
้ฎ้ข๏ผ็ปๆๅ่ดจๅ
Query: "router configuration"
Top results:
1. "Configured Omada router..." (score: 0.95)
2. "Configured Omada router..." (score: 0.93) โ ้ๅค๏ผ
3. "Configured Omada router..." (score: 0.91) โ ้ๅค๏ผ
MMR ็ฎๆณ
ๆไปถ: src/memory/mmr.ts
export function applyMMR(
results: SearchResult[],
queryVector: number[],
lambda: number = 0.7, // ็ธๅ
ณๆงๆ้
maxResults: number
): SearchResult[] {
const selected: SearchResult[] = [];
const candidates = [...results];
while (selected.length < maxResults && candidates.length > 0) {
let bestMMR = -Infinity;
let bestIndex = -1;
for (let i = 0; i < candidates.length; i++) {
const candidate = candidates[i];
// 1. ่ฎก็ฎไธๆฅ่ฏข็็ธๅ
ณๆง
const relevance = cosineSimilarity(queryVector, candidate.embedding);
// 2. ่ฎก็ฎไธๅทฒ้็ปๆ็ๆๅคง็ธไผผๅบฆ
let maxSimToSelected = 0;
for (const sel of selected) {
const sim = jaccardSimilarity(candidate.text, sel.text);
maxSimToSelected = Math.max(maxSimToSelected, sim);
}
// 3. MMR ๅๆฐ = ฮป * ็ธๅ
ณๆง - (1-ฮป) * ็ธไผผๅบฆ
const mmrScore = lambda * relevance - (1 - lambda) * maxSimToSelected;
if (mmrScore > bestMMR) {
bestMMR = mmrScore;
bestIndex = i;
}
}
selected.push(candidates.splice(bestIndex, 1)[0]);
}
return selected;
}็ฎๆณๅ็:
MMR = ฮป * relevance(query, doc) - (1-ฮป) * max(similarity(doc, selected))
ฮป = 0.7 (้ป่ฎค):
- ๆด้่ง็ธๅ
ณๆง
- ่ฝปๅพฎๆฉ็ฝ้ๅคๅ
ๅฎน
ฮป = 0.5:
- ๅนณ่กก็ธๅ
ณๆงๅๅคๆ ทๆง
ฮป = 0.3:
- ๆด้่งๅคๆ ทๆง
- ้ๅ exploratory search
ๆๆฌ็ธไผผๅบฆ่ฎก็ฎ:
function jaccardSimilarity(text1: string, text2: string): number {
const set1 = new Set(tokenize(text1));
const set2 = new Set(tokenize(text2));
const intersection = new Set([...set1].filter(x => set2.has(x)));
const union = new Set([...set1, ...set2]);
return intersection.size / union.size;
}ไฝฟ็จ Jaccard ่้ๅ้็ธไผผๅบฆ๏ผๆด็ด่งๅๆ ๆๆฌ้ๅค็จๅบฆใ
MMR ๆๆ็คบไพ
Query: "router configuration"
Before MMR:
1. "Configured Omada router..." (0.95)
2. "Configured Omada router..." (0.93)
3. "Configured Omada router..." (0.91)
4. "Set up AdGuard DNS..." (0.85)
5. "Router VLAN config..." (0.82)
After MMR (ฮป=0.7):
1. "Configured Omada router..." (0.95) โ ้
2. "Set up AdGuard DNS..." (0.85 ร 1.0 = 0.85) โ diverse, ้
3. "Router VLAN config..." (0.82 ร 1.0 = 0.82) โ diverse, ้
4. "Configured Omada router..." (0.93 - 0.9 = 0.03) โ ็ธไผผๅบฆ้ซ๏ผๅผ
5. "Configured Omada router..." (0.91 - 0.9 = 0.01) โ ็ธไผผๅบฆ้ซ๏ผๅผ
ๆถ้ด่กฐๅ
้ฎ้ข๏ผๆงๆๆกฃๆๅ่ฟ้ซ
Query: "Rod standup time"
Without decay:
1. memory/2025-09-15.md - "Rod works Mon-Fri..." (score: 0.91)
2. memory/2026-02-10.md - "Rod has standup at 14:15..." (score: 0.82)
้ฎ้ข: 2025-09 ็ๆๆกฃๅทฒ็ป่ฟๆถ๏ผไฝ่ฏญไนๅน้
ๅบฆๆด้ซ๏ผ
ๆๆฐ่กฐๅๅ ฌๅผ
ๆไปถ: src/memory/temporal-decay.ts
export function applyTemporalDecay(
results: SearchResult[],
halfLifeDays: number = 30
): void {
const now = Date.now();
const lambda = Math.log(2) / halfLifeDays; // ่กฐๅ็ณปๆฐ
for (const result of results) {
// ไปๆไปถๅๆๅๆฅๆ
const fileDate = extractDateFromPath(result.path);
if (!fileDate) continue; // ้ๆฅๆๆไปถไธ่กฐๅ
const ageInDays = (now - fileDate.getTime()) / (1000 * 60 * 60 * 24);
const decayFactor = Math.exp(-lambda * ageInDays);
result.score *= decayFactor;
result.temporalDecayApplied = true;
}
}่กฐๅๆฒ็บฟ (ๅ่กฐๆ 30 ๅคฉ):
Day 0: score ร 1.00 = 100%
Day 7: score ร 0.86 = 86%
Day 30: score ร 0.50 = 50%
Day 90: score ร 0.125 = 12.5%
็นๆฎๅค็: MEMORY.md ๅ memory/projects.md ็ญ้ๆฅๆๆไปถไธ่กฐๅ๏ผ่งไธบๅธธ้ๆๆกฃใ
ๆๆๅฏนๆฏ
Query: "Rod standup time"
With decay (halfLife=30):
1. memory/2026-02-10.md - 0.82 ร 1.00 = 0.82 โ ไปๅคฉ๏ผๆๆฐ
2. memory/2026-02-03.md - 0.80 ร 0.85 = 0.68 โ 7ๅคฉๅ
3. memory/2025-09-15.md - 0.91 ร 0.03 = 0.03 โ 5ไธชๆๅ๏ผๅ ไนๅฟฝ็ฅ
ๅฎๆดๆ็ดขๆต็จ
flowchart TB Q[็จๆทๆฅ่ฏข] --> E[ๅตๅ ฅๅ้ๅ] subgraph "ๅ่ทฏๅฌๅ" E -->|Top 24| V[ๅ้ๆ็ดข] Q -->|Top 24| K[BM25 ๆ็ดข] end subgraph "่ๅ (Fusion)" V -->|normalized| F[ๅ ๆๆฑๅ] K -->|normalized| F F --> M[ๅๅนถๅป้] end subgraph "ๅๅค็" M --> T["ๆถ้ด่กฐๅ: score * decayFactor"] T --> R["MMR ้ๆ: lambda * rel - (1-lambda) * sim"] end R -->|Top 6| Result[ๆ็ป็ปๆ]
ๆง่ฝไผๅ
1. ๅนถ่กๆฅ่ฏข
// ๅ้ๆ็ดขๅๅ
ณ้ฎ่ฏๆ็ดขๅนถ่กๆง่ก
const [vectorResults, keywordResults] = await Promise.all([
this.searchVector(queryVector, topK * 4),
this.searchKeyword(query)
]);2. ๆฐๆฎๅบ็ดขๅผ
-- ๅ้ๆฅ่ฏขไผๅ
CREATE INDEX idx_chunks_path ON chunks(path);
CREATE INDEX idx_chunks_source ON chunks(source);
-- FTS5 ่ชๅจไผๅ
-- ๆ ้้ขๅค็ดขๅผ3. ๆฉๅ็ญ็ฅ
// MMR ๆๅ็ปๆญขๆกไปถ
if (selected.length >= maxResults) break;
if (bestMMR < threshold) break; // ๅๆฐ่ฟไฝไธๅ้ๆฉๅๆฐ่ฐไผๅปบ่ฎฎ
| ๅๆฐ | ้ป่ฎคๅผ | ่ฐๆดๅปบ่ฎฎ |
|---|---|---|
vectorWeight | 0.7 | ่ฏญไนๆ็ดขไธบไธปไฟๆ 0.7๏ผๅ ณ้ฎ่ฏไธบไธป้ๅฐ 0.5 |
textWeight | 0.3 | ไธ vectorWeight ไบ่กฅ๏ผๅไธบ 1 |
candidateMultiplier | 4 | ่ฟฝๆฑ้ๅบฆ้ๅฐ 2๏ผ่ฟฝๆฑ่ดจ้ๅๅฐ 8 |
mmr.lambda | 0.7 | ๅคๆ ทๆง่ฆๆฑ้ซ้ๅฐ 0.5 |
temporalDecay.halfLifeDays | 30 | ๅฟซ้ๅๅไธป้ข้ๅฐ 7๏ผ็จณๅฎ็ฅ่ฏๅๅฐ 90 |
่ฎพ่ฎกๆๆณๆป็ป
- ไบ่กฅๆง: ๅ้ + ๅ ณ้ฎ่ฏไบ่กฅๅ่ช็็ฒๅบ
- ๅ้ๆฑ : ๅ ๅนฟๆณๅฌๅ๏ผๅ็ฒพๆๆชๆญ
- ๅคๆ ทๆง: MMR ้ฟๅ ็ปๆๅ่ดจๅ
- ๆถๆๆง: ๆถ้ด่กฐๅ่ฎฉๆฐๅ ๅฎนไผๅ
- ๅฏ้ ็ฝฎ: ๆๆๅๆฐๅฏ่ฐ๏ผ้ๅบไธๅๅบๆฏ
็ธๅ ณๆๆกฃ: Memory ๆบ็ ๅๆ, Memory ่ฎพ่ฎกๆๆณ