Skip to content

Model evaluations

Pre-rendered evaluation pages for popular models on common GPUs. Every page is real llm-cal output.

Qwen/Qwen2.5-72B-Instruct

GPU Weight Quant Prod GPUs Page
A100-80G 145.4 GB BF16 8
H100 145.4 GB BF16 8
H800 145.4 GB BF16 8

Qwen/Qwen2.5-7B

GPU Weight Quant Prod GPUs Page
L40S 15.2 GB BF16 4
RTX4090 15.2 GB BF16 7

Qwen/Qwen3-30B-A3B

GPU Weight Quant Prod GPUs Page
A100-80G 61.1 GB BF16 4
H100 61.1 GB BF16 4

deepseek-ai/DeepSeek-V3

GPU Weight Quant Prod GPUs Page
B200 688.6 GB FP8 8
H100 688.6 GB FP8 8
H800 688.6 GB FP8 8

deepseek-ai/DeepSeek-V4-Flash

GPU Weight Quant Prod GPUs Page
910B4 159.6 GB FP4_FP8_MIXED 8
B200 159.6 GB FP4_FP8_MIXED 2
H100 159.6 GB FP4_FP8_MIXED 8
H800 159.6 GB FP4_FP8_MIXED 8

microsoft/Phi-4

GPU Weight Quant Prod GPUs Page
L40S 29.3 GB BF16 8
RTX4090 29.3 GB BF16 8

mistralai/Mixtral-8x7B-v0.1

GPU Weight Quant Prod GPUs Page
A100-80G 93.4 GB BF16 8
H100 93.4 GB BF16 8