Multimodal Minds Labs | AI Experiments & Research

Agents In Progress

Multi-Agent Orchestration Framework

Testing CrewAI vs LangGraph for orchestrating complex, multi-step agentic workflows. Benchmarking task completion rate, token cost, and latency across 5 real business scenarios.

CrewAI LangGraph GPT-4o Python

RAG In Progress

Hybrid RAG: BM25 + Vector Search

Combining sparse BM25 keyword retrieval with dense vector embeddings using reciprocal rank fusion. Tested on a 50k document corpus — achieved 23% better precision@5 vs pure vector search.

ChromaDB BM25Okapi text-embedding-3 FastAPI

Voice In Progress

Ultra-Low Latency Voice Agent

Sarvam AI powered STT → LLM → TTS pipelines with sub-800ms end-to-end latency. Testing Deepgram, Whisper Turbo, and ElevenLabs Turbo v2.5 in combination with streaming LLM responses.

Deepgram Whisper Sarvam AI WebSockets

MLOps In Progress

Model Drift Detection Pipeline

Building a lightweight data drift monitor using Evidently AI connected to an Airflow DAG. Automatically triggers model retraining when statistical drift exceeds threshold across key feature distributions.

Evidently AI Apache Airflow MLflow Docker

Vision In Progress

Multimodal Document Intelligence

Automating extraction from complex PDFs (tables, charts, handwriting) using GPT-4V and Claude 3.5. Compared structured output quality against traditional OCR + regex pipelines at scale.

GPT-4V Claude 3.5 Tesseract LangChain

Infra In Progress

Serverless LLM Inference on AWS

Deploying quantized LLaMA 3 (GPTQ 4-bit) on AWS Lambda + Graviton3 for ultra cost-efficient inference. Exploring cold-start mitigation with provisioned concurrency and model caching strategies.

AWS Lambda LLaMA 3 GPTQ Terraform

Agents In Progress

Self-Correcting Code Agent

An agent that writes Python code, executes it in a sandboxed environment, reads the error traceback, and iteratively fixes its own mistakes. Achieved 87% task success on HumanEval-style benchmarks.

GPT-4o E2B Sandbox LangChain Python

MLOps In Progress

LLM Eval Harness

A custom evaluation framework for LLM outputs in production. Automated grading via judge LLMs + human baselines, with dashboards tracking hallucination rate, instruction-following, and output consistency over time.

Prometheus Grafana GPT-4o-mini Pydantic

RAG In Progress

Graph RAG: Knowledge Graph Retrieval

Using Neo4j to model entity relationships extracted from enterprise docs, then using Cypher queries as a retrieval layer alongside vector search. Testing multi-hop reasoning on compliance and legal documents.

Neo4j LangChain Pinecone Claude 3.5

Where AI Ideas
Come to Life.

Powered by

Multi-Agent Orchestration Framework

Hybrid RAG: BM25 + Vector Search

Ultra-Low Latency Voice Agent

Model Drift Detection Pipeline

Multimodal Document Intelligence

Serverless LLM Inference on AWS

Self-Correcting Code Agent

LLM Eval Harness

Graph RAG: Knowledge Graph Retrieval

The Engineering
Sandbox.

Stay in the Loop

Where AI Ideas Come to Life.

Powered by

Multi-Agent Orchestration Framework

Hybrid RAG: BM25 + Vector Search

Ultra-Low Latency Voice Agent

Model Drift Detection Pipeline

Multimodal Document Intelligence

Serverless LLM Inference on AWS

Self-Correcting Code Agent

LLM Eval Harness

Graph RAG: Knowledge Graph Retrieval

The EngineeringSandbox.

Stay in the Loop

Where AI Ideas
Come to Life.

The Engineering
Sandbox.