Nvidia SPIR-V Compiler Bug or Do Subgroup Shuffle Operations Not Imply Execution Dependency?
· 14 min read
Reduce and scan operations are core building blocks in the world of parallel computing, and now Nabla has a new release with those operations made even faster for Vulkan at the subgroup and workgroup levels.
This article takes a brief look at the Nabla implementation for reduce and scan on the GPU in Vulkan.
Then, I discuss a missing execution dependency expected for a subgroup shuffle operation, which was only a problem on Nvidia devices in some test cases.