Open Lab ยท Work In Progress

Where AI Ideas
Come to Life.

Real experiments, prototypes, and proof-of-concepts from the Multimodal Minds engineering lab. Built in public. Shipped raw. No polish โ€” just signal.

9 Experiments All In Progress Actively Building
Agents In Progress

Multi-Agent Orchestration Framework

Testing CrewAI vs LangGraph for orchestrating complex, multi-step agentic workflows. Benchmarking task completion rate, token cost, and latency across 5 real business scenarios.

CrewAI LangGraph GPT-4o Python
RAG In Progress

Hybrid RAG: BM25 + Vector Search

Combining sparse BM25 keyword retrieval with dense vector embeddings using reciprocal rank fusion. Tested on a 50k document corpus โ€” achieved 23% better precision@5 vs pure vector search.

ChromaDB BM25Okapi text-embedding-3 FastAPI
Voice In Progress

Ultra-Low Latency Voice Agent

Exploring STT โ†’ LLM โ†’ TTS pipelines with sub-800ms end-to-end latency. Testing Deepgram, Whisper Turbo, and ElevenLabs Turbo v2.5 in combination with streaming LLM responses.

Deepgram Whisper ElevenLabs WebSockets
MLOps In Progress

Model Drift Detection Pipeline

Building a lightweight data drift monitor using Evidently AI connected to an Airflow DAG. Automatically triggers model retraining when statistical drift exceeds threshold across key feature distributions.

Evidently AI Apache Airflow MLflow Docker
Vision In Progress

Multimodal Document Intelligence

Automating extraction from complex PDFs (tables, charts, handwriting) using GPT-4V and Claude 3.5. Compared structured output quality against traditional OCR + regex pipelines at scale.

GPT-4V Claude 3.5 Tesseract LangChain
Infra In Progress

Serverless LLM Inference on AWS

Deploying quantized LLaMA 3 (GPTQ 4-bit) on AWS Lambda + Graviton3 for ultra cost-efficient inference. Exploring cold-start mitigation with provisioned concurrency and model caching strategies.

AWS Lambda LLaMA 3 GPTQ Terraform
Agents In Progress

Self-Correcting Code Agent

An agent that writes Python code, executes it in a sandboxed environment, reads the error traceback, and iteratively fixes its own mistakes. Achieved 87% task success on HumanEval-style benchmarks.

GPT-4o E2B Sandbox LangChain Python
MLOps In Progress

LLM Eval Harness

A custom evaluation framework for LLM outputs in production. Automated grading via judge LLMs + human baselines, with dashboards tracking hallucination rate, instruction-following, and output consistency over time.

Prometheus Grafana GPT-4o-mini Pydantic
RAG In Progress

Graph RAG: Knowledge Graph Retrieval

Using Neo4j to model entity relationships extracted from enterprise docs, then using Cypher queries as a retrieval layer alongside vector search. Testing multi-hop reasoning on compliance and legal documents.

Neo4j LangChain Pinecone Claude 3.5
About This Lab

The Engineering
Sandbox.

This is where I prototype, break, and rebuild ideas before they become products or services. Every experiment here is something I'm actively exploring in the context of real business problems โ€” not academic exercises.

Some things will fail. Some will become part of the Multimodal Minds service stack. All of it gets documented here, openly.

Ship Fast, Learn Faster

Experiments are released when functional, not polished.

Documented in Public

Every experiment includes methodology, results, and honest failure analysis.

Production-Oriented

Experiments are designed with real deployment constraints in mind.

Discuss an Idea โ†’
labs/run_experiment.py
# Multimodal Minds Labs
from mm_labs import Experiment

# Define the experiment
exp = Experiment(
  name="hybrid-rag-v2",
  tags=["rag", "retrieval"],
  hypothesis="BM25 + vector
  fusion beats pure vector"
)

# Run and log results
results = exp.run()
exp.log({
  "precision@5": 0.847,
  "latency_ms": 312,
  "cost_per_1k": 0.0042
})

# โœ… Experiment complete
print("Status: SHIPPED")
๐Ÿ””

Stay in the Loop

New experiments drop every few weeks. Follow along on LinkedIn or get notified when something ships.