MINILOADER BLOG: RESEARCH & DEVELOPMENT
Backporting FP8 to the RTX 3090 (No H100 Required)
NVIDIA's FP8 story usually requires an H100 — Adhitya disagrees. By treating FP8 as a storage format rather than a compute format, he backports the numerics to consumer Ampere hardware. Weights are stored as 1-byte FP8 bit patterns in VRAM, decoded on the fly via a 256-entry LUT, and fed into INT8 tensor cores (IMMA) for the matmul. The result: up to 2× lower VRAM footprint without needing next-gen silicon.
Reverse-Engineering the RK3588 NPU: Hacking Memory Limits to Run Vision Transformers
The RK3588 promises 6 TOPS but the standard SDK chokes on Vision Transformers — leaving a $100 board running SmolVLM at 30s per inference on the CPU. Adhitya reverse-engineered the NPU, discovered a hard 32KB L1 scratchpad limit, and built a nano-tiling algorithm to slice Attention matrices to fit. A "poison pill" dummy op defeats the compiler's fusion heuristics, and a custom user-space scheduler fires 26 shards across the chip's 3 NPU cores. Result: 30s → 1.8s, 0.999 accuracy match.
Why can't you see your agent? Can your agent see you?
Agents can't see what we can unless we share the same space. Embodied AI, agentic avatars, AI NPCs — whatever you call it — is the next frontier. At Miniloader we're actively breaking those boundaries down.
The Miniloader Model Family: Benchmarks, Capabilities, and Why We Chose These Models
Seven models from three families, tested on a consumer AMD GPU under the Vulkan backend. Full throughput numbers, vision and reasoning results, and the reasoning behind the Miniloader local LLM catalogue.
OpenClaw Just Turned a Niche Category Into a Real Market
OpenClaw is sitting at 20.1 trillion tokens on OpenRouter — more than the next three services combined. Here is what that number actually means for the local AI market.
From the Team
How Miniloader went from idea to live beta on a compressed timeline—and why AI-assisted execution may be the next wave of individual leverage in computing.
Miniloader at Full Power: Four Real-World Use Cases
From a local RAG research desk to Postgres analytics, bots over Ngrok, and a secured remote command center: how one rack replaces cloud bills and lock-in.
