Essential Math & Concepts for LLM InferenceBack of the envelope calculations to estimate model's GPU memory requirements & insights into HW/SW optimizationsMay 31, 2024Β·12 min readΒ·1.6K