AI Framework LLAMACPP Hybrid CPU Strategy
The problem: hybrid cores aren’t symmetric multiprocessing On Intel’s Alder Lake and later, a CPU isn’t a uniform pool of identical cores anymore — it’s a mix of P-cores (fast, wide out-of-order, hyperthreaded) and E-cores (slower per-thread, no hyperthreading, often missing ISA features the P-cores have, like AVX-512 on some SKUs). ggml’s threadpool has no idea about any of this by default: it spins up N threads and expects them to make roughly equal progress on equal-sized work slices. On a P+E hybrid part, that assumption breaks — the E-core threads become stragglers, and every generation step blocks on the slowest thread. Throwing more threads at the problem makes it worse, not better, since token generation is memory-bandwidth-bound, not compute-bound, past a fairly low thread count. ...