继续学习 Context Engineering 相关的文章，主要看了这两篇：- 来自 Augment 的《How we made code search 40% faster for 100M+ line codebases using quantized vector search》- 来自 Anthropic 的《Contextual Retrieval in AI Systems》📒 关于 query 的性能：1. 利用 ANN 大幅减少 embeddings vectors 数量，提高检索速度

继续学习 Context Engineering 相关的文章，主要看了这两篇：

- 来自 Augment 的《How we made code search 40% faster for 100M+ line codebases using quantized vector search》
- 来自 Anthropic 的《Contextual Retrieval in AI Systems》

📒 关于 query 的性能：

1. 利用 ANN 大幅减少 embeddings vectors 数量，提高检索速度。
我理解类似于对向量进行聚类，然后只对聚类中心进行搜索，最后再在聚类内进行精确搜索。
但是不确定 Augment 是自己手写了一个 ANN 的聚合层，还是直接使用了类似 pg_vector 的 IVF 等功能。
2. 对 ANN 方案的优化，对 embeddings index 做 SNAPSHOT。只有搜索的 chunk 在 SNAPSHOT 中时才触发 ANN 加速，否则走传统的线性扫描。
这样就可以适应无时不刻都在动态更新的代码库。

📒 关于 query 的准确率和召回率：

1. embeddings vectors 擅长 semantic search，但对精确匹配不友好。结合 BM25 来提高精确匹配的能力。
2. 在 embeddings 以前，先为 chunk 添加上下文信息
有一个优化技巧是，先利用 prompt cache 缓存全文，然后再使用这个 cache 为每一个 chunk 生成上下文。
3. 玄学参数，每次 RAG 提供 20 个 context chunks 的效果最好。Anthropic 的做法是，用混合搜索检索出 400 个 chunks，然后使用 cohere rerank 选出 top20。
4. rerank 会带来额外的延迟，所以 chunks 数量和延迟是一个 trade-off。

👇 next