GPU Puzzlers
GPU MemoryGlossaryAbout
  • Accounting for FLOPS

    Is the GPU faster at addition or multiplication?
  • The Faster Way to Add?

    This puzzle presents three different ways to add elements of a tensor. Can you figure out the fastest implementation?
  • Order of Kernels

    The order of operations matters on the GPU. Can you find the faster ordering?
  • Quantization Quirks

    When is matrix multiplication compute bound and when is it memory bandwidth bound on a GPU?
  • Memorable Mysteries

    What is the optimal way to do a matrix transpose on a GPU?
  • Swimming in Streams

    Can GPUs communicate and compute at the same time?
  • To Fuse or Not to Fuse?

    Can the arithmetic intensity of a program be increased?
  • Communication is the Key to Success

    Data can be transmitted in many ways but, can you find the most efficient way?

GPU Puzzlers

  • Adnan Aziz and Anupam Bhatnagar