Miniloader runs agents, models, and tools on your own machine. No subscriptions, no leaked prompts, no terminal required.
New on Miniloader:AC Coach, real-time AI coaching for Assetto Corsa.
OpenClaw burns tokens faster than most users expect. Usage is invisible. Costs spiral before you notice.
Hermes is capable but command-line only and hard to configure. Not an option for most people.
Ollama has no workflow layer. LM Studio is a backend with no front end. Neither helps you build a system.
Pick a preset. Slot in your modules. Auto-wiring does the rest. You're running in minutes, with full visibility.
personal AI assistant · desktop + tools + workflows
open-source coding agent · terminal + IDE + desktop
NousResearch Hermes · instruction-tuned models
VS Code BYOK agent · human-in-the-loop approval
structured VS Code agent · plan / code / debug modes
CLI pair programmer · git-native editing
CLI + desktop agent · Block open-source
VS Code + JetBrains BYOK agent
framework for composing LLM applications
personal AI assistant · desktop + tools + workflows
open-source coding agent · terminal + IDE + desktop
NousResearch Hermes · instruction-tuned models
VS Code BYOK agent · human-in-the-loop approval
structured VS Code agent · plan / code / debug modes
CLI pair programmer · git-native editing
CLI + desktop agent · Block open-source
VS Code + JetBrains BYOK agent
framework for composing LLM applications
Cure token anxiety. Let local models handle the routine work, and save premium tokens for the tasks that actually need them.
Central orchestration loop for chat turns, tool calls, and streamed responses. Takes client requests, prepares model calls, and coordinates tool execution round-trips through the tools channel. Expects an OpenAI-compatible backend upstream and tracks connectivity state for that endpoint.
Local inference engine that loads GGUF models through llama.cpp and generates token streams. Process-isolated so model runtime crashes do not take down the main Hypervisor process. Core local model runtime used behind server-facing modules such as gpt_server.
In-process LiteLLM gateway for cloud providers that acts as a drop-in OpenAI-compatible API source for the same downstream wiring used by local stacks.
LiveKit Voice handles realtime STT/TTS by turning speech into agent turns and streaming responses back as synthesized audio with browser join configuration.
OpenClaw install and auto-config helper against your local AI server configuration.
Ngrok Tunnel publishes local services to the internet by consuming routing config and emitting tunnel status with a public URL on the same channel.
17 modules ship with Miniloader: Local Brain, Chat Terminal, GPT Server, Database, File Access, and more.
VIEW ALLSupported Services
*coming soon
Adhitya breaks the RK3588 NPU memory limit with nano-tiling and scheduler-level workarounds to run Vision Transformers on low-cost edge hardware.
By treating FP8 as storage instead of compute, Adhitya maps FP8 data through lookup-table decoding into INT8 tensor core paths for large VRAM savings on Ampere.
Embodied AI, agentic avatars, AI NPCs — whatever you call it — is the next frontier for agents. At Miniloader we're actively breaking those boundaries down.
Seven models, three families, one consumer desktop. Every model in the default catalogue goes through a full capability run before it ships. These are the results.
Be one of the first to run Miniloader on your machine.