Experimenting with bearblog for now
Updated
β’1 min readV
π§βπ» Research & Engineering - LLM Inference & High Performance systems. |πBerlin π©πͺ | π https://venkat.eu | π¬ https://twitter.com/Venkat2811
Search for a command to run...
π§βπ» Research & Engineering - LLM Inference & High Performance systems. |πBerlin π©πͺ | π https://venkat.eu | π¬ https://twitter.com/Venkat2811
No comments yet. Be the first to comment.
Back of the envelope calculations to estimate model's GPU memory requirements & insights into HW/SW optimizations

Exploring Locality of Reference, LMAX Disruptor & Flash Attention

A digestible high-level overview of what happens in The Die

Debugging resource leakage and optimizing server configuration
