-
Rethinking vllm-metal's Memory Budget for Apple Silicon
Exploring memory allocation strategies for LLM inference engines on Apple Silicon's unified memory architecture.
Exploring memory allocation strategies for LLM inference engines on Apple Silicon's unified memory architecture.