AI | Gurwinder's Blog: AI & Graphics

AI

A thumbnail image

Understanding Triton Kernels from First Principles

A deep dive into how Triton kernels work, explained from absolute basics to complete understanding. …

Gurwinder Mar 19, 2026 · 7 min read

A thumbnail image

Under the Hood: How PyTorch Chooses Attention Kernels and Why It Matters for Performance

A deep dive into PyTorch’s attention kernel selection and what each choice means for your …

Gurwinder Sep 20, 2025 · 8 min read

A thumbnail image

From Theory to Practice: Quantization and Dequantization Made Simple

Quantization transforms floating-point values (‘float32’) into lower-precision formats, such as …

Gurwinder Jan 5, 2025 · 9 min read

A thumbnail image

Breaking Down Vision Transformers: A Code-Driven Explanation

In this article, I’ll break down the layers of a ViT step by step with code snippets, and a …

Gurwinder Nov 25, 2024 · 4 min read

A thumbnail image

Harnessing Local Llama to Process Complete Projects: How I use AI for code suggestions and refactoring my Projects

We’ll walk through a Python script that leverages the LangChain framework to process a codebase, …

Gurwinder Oct 10, 2024 · 6 min read

A thumbnail image

The Magic of DPAS on Intel's XMX Engines: Cracking Why GPUs are Fast

When you think of multiplying matrices, you probably imagine a lot of numbers flying around and …

Gurwinder Sep 29, 2024 · 5 min read

A thumbnail image

Code, Run, Debug on AutoPilot: Let Your Local Llama Do All Your Heavy Lifting!

AutoGen isn’t just another framework; it marks a revolutionary leap in leveraging Large …

Gurwinder Jul 31, 2024 · 5 min read

A thumbnail image

Deep Learning for Graphics Programmers: Performing Tensor Operations with DirectML and Direct3D 12

In the rapidly evolving landscape of machine learning and artificial intelligence, harnessing the …

Gurwinder Jul 14, 2024 · 9 min read

A thumbnail image

Comparing SYCL, OpenCL, and CUDA: Matrix Multiplication Example

Matrix multiplication is a core operation in scientific and engineering applications, often …

Gurwinder Jul 5, 2024 · 7 min read

A thumbnail image

The Simple Path to PyTorch Graphs: Dynamo and AOT Autograd Explained

Graph acquisition in PyTorch refers to the process of creating and managing the computational graph …

Gurwinder Apr 6, 2024 · 4 min read

A thumbnail image

Profiling ResNet Models with PyTorch Profiler for Performance Optimization

In the realm of deep learning, model performance is paramount. Whether you’re working on image …

Gurwinder Apr 2, 2024 · 3 min read

A thumbnail image

Accelerating Deep Learning Inference on Intel Arc 770: ONNX and PyTorch Go Head-to-Head

When deploying deep learning models, the choice of framework can significantly impact performance. …

Gurwinder Mar 1, 2024 · 5 min read

A thumbnail image

Warmup Wisdom: Accurate PyTorch Benchmarking Made Simple!

In the realm of PyTorch model benchmarking, achieving accurate results is paramount for gauging …

Gurwinder Feb 10, 2024 · 3 min read

A thumbnail image

Delving into ONNX: Comprehending Computation Graphs and Structure

ONNX (Open Neural Network Exchange) is an open-source format designed to represent machine learning …

Gurwinder Jun 13, 2023 · 5 min read