12 results
- llama.cpp Featuredllm-inferenceLLM inference in C/C++.Low setupCPU OnlyCommercial useMIT★ 111kUpdated 25d ago
- Ollama Featuredllm-inferenceGet up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.Medium setupCPU OnlyCommercial useMIT★ 172kUpdated 25d ago
- vLLM Featuredllm-inferenceA high-throughput and memory-efficient inference and serving engine for LLMs.Medium setupNVIDIA GPU (CUDA)Commercial useApache 2.0★ 80kUpdated 25d ago
- ExLlamaV2llm-inferenceA fast inference library for running LLMs locally on modern consumer-class GPUs.Medium setupNVIDIA GPU (CUDA)Commercial useMIT★ 4.5kUpdated 26d ago
- KoboldCppllm-inferenceRun GGUF models easily with a KoboldAI UI. One File. Zero Install.Low setupCPU OnlyAGPL 3.0★ 11kUpdated 25d ago
- LocalAIllm-inferenceLocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.Low setupCPU OnlyCommercial useMIT★ 46kUpdated 25d ago
- MLC LLMllm-inferenceUniversal LLM Deployment Engine with ML Compilation.Low setupNVIDIA GPU (CUDA)Commercial useApache 2.0★ 23kUpdated 26d ago
- MLX (Apple)llm-inferenceMLX: An array framework for Apple silicon.Medium setupApple Silicon (Metal)Commercial useMIT★ 26kUpdated 25d ago
- SGLangllm-inferenceSGLang is a high-performance serving framework for large language models and multimodal models.Medium setupNVIDIA GPU (CUDA)Commercial useApache 2.0★ 28kUpdated 25d ago
- TabbyAPIllm-inferenceThe official API server for Exllama. OAI compatible, lightweight, and fast.Medium setupNVIDIA GPU (CUDA)AGPL 3.0★ 1.2kUpdated 25d ago
- TensorRT-LLM (NVIDIA)llm-inferenceTensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.Medium setupNVIDIA GPU (CUDA)Commercial useOther★ 14kUpdated 25d ago
- Text Generation Inference (Hugging Face)llm-inferenceLarge Language Model Text Generation Inference.Medium setupNVIDIA GPU (CUDA)Commercial useApache 2.0★ 11kUpdated 25d ago
Offgrid AI tools · Updated daily
Enclavetools
Stop paying for AI APIs. Everything here runs on your hardware.
Sponsor
Reach 50,000+ enterprise buyers looking for private AI solutions.
Newsletter
5 new tools, every Friday
No fluff. No spam. Join 12,000+ builders.
Get featured
Put your tool at the top
Featured listings get 10× more clicks and are shown prominently across the directory.
Page 1 of 1