跳到主要内容

同步

同步是 Vulkan 最强大但也最复杂的内容之一,在vulkan中,由应用开发人员负责使用各种 Vulkan 同步原语来管理同步。同步使用不当会导致难以发现的错误,而且可能会让GPU产生不必要地等待而影响性能。

Khronos 提供了一组示例和文章了解 Vulkan 同步,介绍了如何使用一些同步原语。此外,还有 Tobias Hector 在 Vulkan 演讲中的内容:[ppt第1部分](https://www.khronos.org/assets/uploads/developers/library/2017-vulkan-devu-vancouver/009 - Synchronization - Keeping Your Device Fed.pdf)(视频1)和[ppt第2部分](https://www.khronos.org/assets/uploads/developers/library/2018-vulkanised/06-Keeping Your Device Fed v4_Vulkanised2018.pdf)(视频2)。

下图是VkEventVkFenceVkSemaphore之间差异的示例图:

synchronization_overview.png

校验

Khronos 校验层已经实现了一些同步验证,可以通过 Vulkan SDK 附带的 Vulkan Configurator来启用。这篇 Khronos 博客也讨论了同步验证白皮书的内容。

管线屏障

Pipeline Barriers用于在执行命令缓冲区时控制管线的哪些步骤等待前面的步骤:

synchronization_pipeline_barrieres.png

Pipeline Barriers 一开始可能很难理解,下面一些 Khronos 演讲和文章对其进行了深入地讲解:

VK_KHR_synchronization2

VK_KHR_synchronization2扩展对原始的核心同步 API 进行了修改,降低了应用程序开发的难度,并添加了一些附加功能。

提示

VK_KHR_synchronization2在 Vulkan 1.3 中提升为核心扩展

该扩展对管道屏障、事件、图像布局转换和队列提交进行了改进。下文介绍了 Vulkan 原始同步操作与VK_KHR_synchronization2扩展功能的区别,还有一些示例说明了应用程序代码如何使用该扩展。

重新设计 Pipeline 阶段和 Access Flag

该扩展的一个主要变化是在 memory barrier 结构中加入了 pipeline stages 和 access flags,使得两者之间的联系更加明显。

增加的新型结构体VkDependencyInfoKHR将所有屏障包装到一起:

VK_KHR_synchronization2_stage_access

添加设置事件的屏障

随着VkDependencyInfoKHR的引入,vkCmdSetEvent2KHRvkCmdSetEvent相比,增加了添加屏障的能力,这使得VkEvent更有用途。由于 synchronization2 实现的VkEvent可能与 Vulkan 1.2 有很大差异,因此严禁将扩展和核心 API 调用的VkEvent混合。例如,不能先调用vkCmdSetEvent2KHR(),然后再调用vkCmdWaitEvents()

复用相同的管线阶段和访问标志

由于32位的VkAccessFlag用完而创建了64位的VkAccessFlags2KHR。为了防止VkPipelineStageFlags出现同样的问题,还创建了64 位的VkPipelineStageFlags2KHR

并非所有的 C/C++ 编译器都提供了64位枚举类型,因此新字段使用了static const值而不是枚举。所以没有VkPipelineStageFlagBitsVkAccessFlagBits的等效类型。一些函数例如vkCmdWriteTimestamp()只能传入具体值,而不是多个位的掩码。这些函数需要转换为使用Flags类型,所以应用代码需进行适当的调整,就像vkCmdWriteTimestamp2KHR()

新标志包含了与原始同步标志相同的位,具有相同的基名称和值。 旧标志可以直接在新 API 中使用,这 2 个示例显示了命名差异:

  • VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT转化为VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT_KHR
  • VK_ACCESS_SHADER_READ_BIT转化为VK_ACCESS_2_SHADER_READ_BIT_KHR

VkSubpassDependency

VkSubpassDependency的管线阶段和访问标志转换到时,需使用VkMemoryBarrier2KHR传入VkSubpassDependency2pNext

例如:

// Without VK_KHR_synchronization2
VkSubpassDependency dependency = {
.srcSubpass = 0,
.dstSubpass = 1,
.srcStageMask = VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT |
VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT,
.dstStageMask = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT,
.srcAccessMask = VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT,
.dstAccessMask = VK_ACCESS_INPUT_ATTACHMENT_READ_BIT,
.dependencyFlags = VK_DEPENDENCY_BY_REGION_BIT
};

转化为:

// With VK_KHR_synchronization2
VkMemoryBarrier2KHR memoryBarrier = {
.sType = VK_STRUCTURE_TYPE_MEMORY_BARRIER_2_KHR,
.pNext = nullptr,
.srcStageMask = VK_PIPELINE_STAGE_2_EARLY_FRAGMENT_TESTS_BIT_KHR |
VK_PIPELINE_STAGE_2_LATE_FRAGMENT_TESTS_BIT_KHR,
.dstStageMask = VK_PIPELINE_STAGE_2_FRAGMENT_SHADER_BIT_KHR,
.srcAccessMask = VK_ACCESS_2_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT_KHR,
.dstAccessMask = VK_ACCESS_2_INPUT_ATTACHMENT_READ_BIT_KHR
};

// The 4 fields unset are ignored according to the spec
// When VkMemoryBarrier2KHR is passed into pNext
VkSubpassDependency2 dependency = {
.sType = VK_STRUCTURE_TYPE_SUBPASS_DEPENDENCY_2,
.pNext = &memoryBarrier,
.srcSubpass = 0,
.dstSubpass = 1,
.dependencyFlags = VK_DEPENDENCY_BY_REGION_BIT
};

拆分管线阶段和访问掩码

一些VkAccessFlagsVkPipelineStageFlags的值在硬件中目标不明确,新的VkAccessFlags2KHRVkPipelineStageFlags2KHR将这些值的场景分开,同时保留旧值以兼容旧代码。

拆分 VK_PIPELINE_STAGE_VERTEX_INPUT_BIT

VK_PIPELINE_STAGE_VERTEX_INPUT_BIT(现在是VK_PIPELINE_STAGE_2_VERTEX_INPUT_BIT_KHR),以前索引输入和顶点输入组合在一个管线阶段,现在各自拆分为1个新阶段:

  • VK_PIPELINE_STAGE_2_INDEX_INPUT_BIT_KHR
  • VK_PIPELINE_STAGE_2_VERTEX_ATTRIBUTE_INPUT_BIT_KHR

拆分 VK_PIPELINE_STAGE_ALL_TRANSFER_BIT

VK_PIPELINE_STAGE_ALL_TRANSFER_BIT(现在是VK_PIPELINE_STAGE_2_ALL_TRANSFER_BIT_KHR),从一个组合的管线阶段拆分为 4 个新阶段:

  • VK_PIPELINE_STAGE_2_COPY_BIT_KHR
  • VK_PIPELINE_STAGE_2_RESOLVE_BIT_KHR
  • VK_PIPELINE_STAGE_2_BLIT_BIT_KHR
  • VK_PIPELINE_STAGE_2_CLEAR_BIT_KHR

拆分 VK_ACCESS_SHADER_READ_BIT

VK_ACCESS_SHADER_READ_BIT(现在是VK_ACCESS_2_SHADER_READ_BIT_KHR),从合并标志拆分为 3 个新的访问标志:

  • VK_ACCESS_2_UNIFORM_READ_BIT_KHR
  • VK_ACCESS_2_SHADER_SAMPLED_READ_BIT_KHR
  • VK_ACCESS_2_SHADER_STORAGE_READ_BIT_KHR

为预光栅化功能合并着色器阶段

除了拆分标志外,还添加了VK_PIPELINE_STAGE_2_PRE_RASTERIZATION_SHADERS_BIT_KHR,以将光栅化之前的着色器阶段合并到一个简便的标志中。

VK_ACCESS_SHADER_WRITE_BIT增加别名

VK_ACCESS_SHADER_WRITE_BIT(现在是VK_ACCESS_2_SHADER_WRITE_BIT_KHR)增加了一个别名VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT_KHR,以更好地描述着色器资源的访问标志。

弃用TOP_OF_PIPE 和 BOTTOM_OF_PIPE

废弃了原VK_PIPELINE_STAGE_TOP_OF_PIPE_BITVK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT的使用, 更新为下面4 种等效场景:

  • VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT在第一个同步作用域中

    // From
    .srcStageMask = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;

    // To
    .srcStageMask = VK_PIPELINE_STAGE_2_NONE_KHR;
    .srcAccessMask = VK_ACCESS_2_NONE_KHR;
  • VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT在第二个同步范围内

    // From
    .dstStageMask = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;

    // To
    .dstStageMask = VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT_KHR;
    .dstAccessMask = VK_ACCESS_2_NONE_KHR;
  • VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT在第一个同步作用域中

    // From
    .srcStageMask = VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT;

    // To
    .srcStageMask = VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT_KHR;
    .srcAccessMask = VK_ACCESS_2_NONE_KHR;
  • VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT在第二个同步范围内

    // From
    .dstStageMask = VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT;

    // To
    .dstStageMask = VK_PIPELINE_STAGE_2_NONE_KHR;
    .dstAccessMask = VK_ACCESS_2_NONE_KHR;

使用新的图像布局

VK_KHR_synchronization2增加了2个新的图像布局VK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL_KHRVK_IMAGE_LAYOUT_READ_ONLY_OPTIMAL_KHR,让布局转换更简便。

下面的示例,在绘制时写入颜色附件和深度/模板附件,然后在下一次绘制中对它们进行采样。以前,应用需要确保布局和访问掩码的正确匹配:

VkImageMemoryBarrier colorImageMemoryBarrier = {
.srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,
.dstAccessMask = VK_ACCESS_SHADER_READ_BIT,
.oldLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL
};

VkImageMemoryBarrier depthStencilImageMemoryBarrier = {
.srcAccessMask = VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT,,
.dstAccessMask = VK_ACCESS_SHADER_READ_BIT,
.oldLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL,
.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL
};

现在有了VK_KHR_synchronization2变得简单多了:

VkImageMemoryBarrier colorImageMemoryBarrier = {
.srcAccessMask = VK_ACCESS_2_COLOR_ATTACHMENT_WRITE_BIT_KHR,
.dstAccessMask = VK_ACCESS_2_SHADER_READ_BIT_KHR,
.oldLayout = VK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL_KHR, // new layout from VK_KHR_synchronization2
.newLayout = VK_IMAGE_LAYOUT_READ_ONLY_OPTIMAL_KHR // new layout from VK_KHR_synchronization2
};

VkImageMemoryBarrier depthStencilImageMemoryBarrier = {
.srcAccessMask = VK_ACCESS_2_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT_KHR,
.dstAccessMask = VK_ACCESS_2_SHADER_READ_BIT_KHR,
.oldLayout = VK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL_KHR, // new layout from VK_KHR_synchronization2
.newLayout = VK_IMAGE_LAYOUT_READ_ONLY_OPTIMAL_KHR // new layout from VK_KHR_synchronization2
};

在新场景下,VK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL_KHR会在上下文中根据格式找到使用的图像。因此,只要颜色格式使用colorImageMemoryBarrierVK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL_KHR就会映射到VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL

此外,有了VK_KHR_synchronization2,如果oldLayoutnewLayout相同,布局转换将不会执行,使用的布局甚至不需要用实际的图像布局,如下面的 barrier 是有效的:

VkImageMemoryBarrier depthStencilImageMemoryBarrier = {
// other fields omitted
.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED,
.newLayout = VK_IMAGE_LAYOUT_UNDEFINED,
};

新的提交流程

VK_KHR_synchronization2增加了vkQueueSubmit2KHR函数并将命令缓冲区和信号量包装在扩展结构体中从而简化了函数参数。合并了VK_KHR_device_groupVK_KHR_timeline_semaphore从 Vulkan 1.1以来的修改。

以下面的提交流程为例:

VkSemaphore waitSemaphore;
VkSemaphore signalSemaphore;
VkCommandBuffer commandBuffers[8];

// Possible pNext from VK_KHR_timeline_semaphore
VkTimelineSemaphoreSubmitInfo timelineSemaphoreSubmitInfo = {
// ...
.pNext = nullptr
};

// Possible pNext from VK_KHR_device_group
VkDeviceGroupSubmitInfo deviceGroupSubmitInfo = {
// ...
.pNext = &timelineSemaphoreSubmitInfo
};

// Possible pNext from Vulkan 1.1
VkProtectedSubmitInfo = protectedSubmitInfo {
// ...
.pNext = &deviceGroupSubmitInfo
};

VkSubmitInfo submitInfo = {
.pNext = &protectedSubmitInfo, // Chains all 3 extensible structures
.waitSemaphoreCount = 1,
.pWaitSemaphores = &waitSemaphore,
.pWaitDstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
.commandBufferCount = 8,
.pCommandBuffers = commandBuffers,
.signalSemaphoreCount = 1,
.pSignalSemaphores = signalSemaphore
};

vkQueueSubmit(queue, 1, submitInfo, fence);

转换为 vkQueueSubmit2KHR

// Uses same semaphore and command buffer handles
VkSemaphore waitSemaphore;
VkSemaphore signalSemaphore;
VkCommandBuffer commandBuffers[8];

VkSemaphoreSubmitInfoKHR waitSemaphoreSubmitInfo = {
.semaphore = waitSemaphore,
.value = 1, // replaces VkTimelineSemaphoreSubmitInfo
.stageMask = VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT_KHR,
.deviceIndex = 0, // replaces VkDeviceGroupSubmitInfo
};

// Note this is allowing a stage to set the signal operation
VkSemaphoreSubmitInfoKHR signalSemaphoreSubmitInfo = {
.semaphore = signalSemaphore,
.value = 2, // replaces VkTimelineSemaphoreSubmitInfo
.stageMask = VK_PIPELINE_STAGE_2_VERTEX_SHADER_BIT_KHR, // when to signal semaphore
.deviceIndex = 0, // replaces VkDeviceGroupSubmitInfo
};

// Need one for each VkCommandBuffer
VkCommandBufferSubmitInfoKHR = commandBufferSubmitInfos[8] {
// ...
{
.commandBuffer = commandBuffers[i],
.deviceMask = 0 // replaces VkDeviceGroupSubmitInfo
},
};

VkSubmitInfo2KHR submitInfo = {
.pNext = nullptr, // All 3 struct above are built into VkSubmitInfo2KHR
.flags = VK_SUBMIT_PROTECTED_BIT_KHR, // also can be zero, replaces VkProtectedSubmitInfo
.waitSemaphoreInfoCount = 1,
.pWaitSemaphoreInfos = waitSemaphoreSubmitInfo,
.commandBufferInfoCount = 8,
.pCommandBufferInfos = commandBufferSubmitInfos,
.signalSemaphoreInfoCount = 1,
.pSignalSemaphoreInfos = signalSemaphoreSubmitInfo
}

vkQueueSubmit2KHR(queue, 1, submitInfo, fence);

上述两个示例代码的区别在于,当顶点着色器阶段完成时,vkQueueSubmit2KHR将发出VkSemaphore signalSemaphore信号,而vkQueueSubmit将等到整个任务执行结束时才会触发。

要让vkQueueSubmit2KHR模拟vkQueueSubmit发出信号量的相同行为,可以在将stageMask设置为VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT

// Waits until everything is done
VkSemaphoreSubmitInfoKHR signalSemaphoreSubmitInfo = {
// ...
.stageMask = VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
// ...
};

仿真层

对于不支持此扩展的设备,Vulkan-Extensionlayer代码库中有一个可移植的实现。更多信息,请参阅layer文档Sync2Compat.Vulkan10 测试用例。

提示

VK_KHR_synchronization2在规范中列出了VK_KHR_create_renderpass2VK_KHR_get_physical_device_properties2的相关要求,在没有这些扩展的情况下使用 synchronization2 可能会导致校验错误。

同步代码示例

未完待续...