> ## Documentation Index > Fetch the complete documentation index at: https://mintlify.com/avnlp/vectordb/llms.txt > Use this file to discover all available pages before exploring further. # LangChain integration overview > Build RAG pipelines with LangChain's retriever, chain, and document store abstractions This module provides LangChain-based retrieval and RAG pipeline implementations across all five supported vector database backends. Every feature is organized as a self-contained directory with configuration files, indexing scripts, and search scripts for each backend. ## What you get * Seventeen retrieval and RAG patterns implemented using LangChain's retriever, chain, and document store abstractions * Full portability across Pinecone, Weaviate, Chroma, Milvus, and Qdrant with feature-specific notes on backend support * YAML-driven configuration with environment variable substitution so credentials stay out of code * Evaluation support via shared `utils/evaluation.py` metrics * Shared reusable components (`components/`) and helper factories (`utils/`) that all feature pipelines draw from ## Vector database support All pipelines support five backends: * **Pinecone**: Managed vector database with hybrid search capabilities * **Weaviate**: Open-source vector search with schema-based filtering * **Chroma**: Embedded database for prototyping and local use * **Milvus**: Scalable vector search with partition support * **Qdrant**: High-performance search with payload indexing ## Module structure Each feature directory follows the same layout: ``` feature_name/ ├── configs/ │ ├── chroma_triviaqa.yaml │ ├── milvus_triviaqa.yaml │ ├── pinecone_triviaqa.yaml │ ├── qdrant_triviaqa.yaml │ ├── weaviate_triviaqa.yaml │ └── (one config per backend × dataset combination) ├── indexing/ │ ├── chroma.py │ ├── milvus.py │ ├── pinecone.py │ ├── qdrant.py │ └── weaviate.py ├── search/ │ ├── chroma.py │ ├── milvus.py │ ├── pinecone.py │ ├── qdrant.py │ └── weaviate.py └── README.md ``` Indexing scripts load a dataset, embed documents using `EmbedderHelper`, and upsert them into the target backend. Search scripts embed a query, retrieve candidates, apply post-retrieval processing, and optionally generate an answer using `RAGHelper`. ## Feature catalog Dense vector similarity search with HuggingFace embeddings Combine dense and sparse embeddings with Reciprocal Rank Fusion Two-stage retrieval with HuggingFace cross-encoder models Maximal Marginal Relevance for balancing relevance and diversity Multi-query, HyDE, and step-back prompting for better recall Compress retrieved documents to query-relevant fragments Multi-step iterative RAG with reflection and routing Structured filter constraints applied at query time Tenant-scoped indexing and retrieval with isolation Logical data partitioning within shared indexes Index child chunks, return parent documents Structured fields from JSON preserved as metadata ## Embedding configuration All LangChain feature pipelines read embedding configuration from YAML: ```yaml theme={null} embeddings: model: "sentence-transformers/all-MiniLM-L6-v2" # Required: full model path device: "cpu" # Optional: "cpu" or "cuda" batch_size: 32 # Optional ``` For hybrid and sparse features, also include: ```yaml theme={null} sparse: model: "naver/splade-cocondenser-ensembledistil" # Required for sparse embedder ``` ## RAG configuration Generation is controlled by the `rag` section: ```yaml theme={null} rag: enabled: true model: "llama-3.3-70b-versatile" api_key: "${GROQ_API_KEY}" temperature: 0.7 max_tokens: 2048 ``` The LangChain `RAGHelper` uses `ChatGroq` for generation. Set `enabled: false` to run retrieval-only pipelines. ## Recommended onboarding path Run `semantic_search` on your target backend with a small dataset limit and verify the pipeline completes successfully. Extract evaluation queries from the dataset and measure baseline retrieval metrics. Add one improvement feature at a time — start with `reranking` (usually the highest single-step gain) or `hybrid_indexing` (for mixed query types). Once quality is stable, layer in `multi_tenancy` or `namespaces` for data isolation. Use `cost_optimized_rag` to find acceptable quality-cost tradeoffs, and `agentic_rag` for complex multi-step reasoning tasks. ## Feature selection guide | If you need... | Use | | ----------------------------------- | --------------------------- | | Starting point and baseline | `semantic_search` | | Both semantic and keyword precision | `hybrid_indexing` | | Pure keyword/lexical precision | `sparse_indexing` | | Better final ranking | `reranking` | | Relevant + diverse result set | `mmr` | | Less redundant context | `diversity_filtering` | | Structured constraints | `metadata_filtering` | | JSON-native documents | `json_indexing` | | Better query recall | `query_enhancement` | | Shorter, cleaner context | `contextual_compression` | | Token/cost budget control | `cost_optimized_rag` | | Iterative multi-step reasoning | `agentic_rag` | | Long docs with fragment search | `parent_document_retrieval` | | Per-customer data isolation | `multi_tenancy` | | Logical data segmentation | `namespaces` | ## LangChain vs Haystack Both frameworks provide similar capabilities but with different design philosophies: **LangChain**: Component-oriented with chains and runnables. Uses `Document` objects and retriever interfaces. **Haystack**: Pipeline-oriented with nodes and pipelines. Uses `Document` objects and node interfaces. **LangChain**: Native integration with ChatGroq, OpenAI, Anthropic via langchain-\* packages. **Haystack**: Integration via generator nodes and prompt builders. **LangChain**: `HuggingFaceEmbeddings` from `langchain-huggingface`. **Haystack**: `SentenceTransformersDocumentEmbedder` and `SentenceTransformersTextEmbedder`. **LangChain**: Manual fusion with `ResultMerger` using Reciprocal Rank Fusion. **Haystack**: Built-in support with `DocumentJoiner` and RRF ranker. Choose LangChain if you prefer chain composition and already use LangChain in your stack. Choose Haystack for pipeline-based workflows and deeper integration with Hugging Face models. ## Next steps Start with baseline dense vector retrieval Combine dense and sparse for robust retrieval Explore reusable LangChain components Build agentic RAG with routing and reflection