Hi there 👋

My interests revolve around CUDA and LLM scaling. Expect a mix of long-form posts and quick-explainers here.

Quick Explainer: GPU Programming with CUDA and Triton

What is CUDA and Triton? CUDA (Compute Unified Device Architecture) was introduced by NVIDIA to allow developers like us to directly program the GPUs. CUDA provides a low level C/C++ API for writing programs that execute on the GPU. We wrap the code to be executed on the GPU inside a function, this function is called a kernel. Not every machine learning person is an expert in using low-level programming languages supported by CUDA....

Quick Explainer: Activation Checkpointing

What is activation checkpointing? Activation checkpointing is a technique used to save GPU memory while training large deep learning models. During the training of a deep learning model, we store the activations in memory to calculate the gradients during the backward pass. Activation checkpointing literally skips the saving part, thus saving a lot of memory. The figure below will give you an idea of the huge amount of memory consumed by activations while training a model:...

Getting Started in the World of Stable Diffusion

This is series of blog posts of me trying to share what I’ve explored in the course ‘From Deep learning foundations to Stable Diffusion’ by FastAI. Everything I share in this post is based on the initial lessons of the FastAI course, which is publicly available here. The lessons also have some accompanying jupyter notebooks on stable diffusion, which can be found in this Github repo. Let’s jump right into the post with the question: ‘What is stable diffusion?...