Welcome to Models & Infrastructure

The nuts and bolts of running AI. This section is for people who care about what happens under the hood: how models run, where they run, how to make them run better, and how to train them for specific needs.

What belongs here:

Hosting & Inference — Local model hosting (Ollama, llama.cpp, vLLM), cloud inference providers (Together, Fireworks, Groq, Replicate, RunPod), API usage and pricing, GPU hardware discussion, performance optimization and benchmarking, quantization, memory management, serving strategies, and self-hosting setups.

Training & Fine-Tuning — Fine-tuning techniques (LoRA, QLoRA, SFT, DPO, RLHF), domain-adaptive continued pre-training, dataset preparation and curation, training infrastructure and tooling, experiment tracking and evaluation, cost and compute planning, sharing results and trained models.

Whether you’re running a quick LoRA on a single GPU or setting up a production inference endpoint, bring your questions and share your results.