Day with local LLM and Mac Silicon

Introduction

In recent years, Apple Silicon Macs (especially the M4 Max and newer chips) have emerged as powerful platforms for local AI model inference.

This article presents a performance comparison of LLM (Large Language Model) runtime environments — specifically Ollama, LMStudio, MLX-ML, and Llama — to understand their efficiency and behavior when running on macOS.

🔧Test Environment

OS: macOS Sequoia 15.6.1
CPU: Apple M4 Max
Memory: 48 GiB

🧠 Models Used

Model	Framework	Notes
openai/gpt-oss-20b	LMStudio, Ollama	Standard model
InferenceIllusionist/gpt-oss-20b-MLX-4bit	MLX-ML	MLX-optimized version for Apple Silicon

Model parameter

{
  "temperature": 0.8,
  "top_p": 0.95,
  "min_p": 0.05,
  "top_k": 40,
  "num_ctx": 4096,
  "num_batch": 64
}

Performance Evaluation Metrics

Two primary performance metrics were measured:
TTFT (Time To First Token) — the time between sending a prompt and receiving the first token of output.
TPS (Tokens Per Second) — the number of tokens generated per second.
In addition, CPU and memory usage were continuously monitored throughout the tests.

🧩 Observations and Results

🧱 LMStudio + gpt-oss-20b	⚙️ MLX-ML + gpt-oss-20b-MLX	🧰 Ollama
Low CPU load Memory usage roughly proportional to model size During warm-up (initialization) and cold-down (idle), memory usage temporarily increases	Optimized for Apple Silicon; highest performance observed Excellent CPU efficiency with consistently high token throughput	MLX not yet supported (as of Sept 2025, related pull request pending) Stable inference performance but slower than MLX-enabled models

⚡ With Apple’s MLX library, even a standalone Mac can achieve practical-level large language model inference.
Across 1,200 test runs, MLX consistently showed superior CPU and memory efficiency, along with the best throughput.

For slide in japanese follow this: https://file.m-cloud.dev/index.php/s/FbMXMj8Yqf8BbbG?openfile=true

Day with local LLM and Mac Silicon

By minhpham93

Related Post

Leave a Reply Cancel reply

You Missed

Day with local LLM and Mac Silicon

ovh cloud experimental

Rockchip up and run

Remote coding and run script from Ipad