Making the stream buffer resident increases GPU usage significantly on
some games. This seems to be addressed invalidating the stream buffer
with InvalidateBufferData instead of using a Unmap + Map (with
invalidation flags).
Switch games are allowed to bind less data than what they use in a
vertex buffer, the expected behavior here is that these values are read
as zero. At the moment of writing this only D3D12, OpenGL and NVN through
NV_vertex_buffer_unified_memory support vertex buffer with a size limit.
In theory this could be emulated on Vulkan creating a new VkBuffer for
each (handle, offset, length) tuple and binding the expected data to it.
This is likely going to be slow and memory expensive when used on the
vertex buffer and we have to do it on all draws because we can't know
without analyzing indices when a game is going to read vertex data out
of bounds.
This is not a problem on OpenGL's BufferAddressRangeNV because it takes
a length parameter, unlike Vulkan's CmdBindVertexBuffers that only takes
buffers and offsets (the length is implicit in VkBuffer). It isn't a
problem on D3D12 either, because D3D12_VERTEX_BUFFER_VIEW on
IASetVertexBuffers takes SizeInBytes as a parameter (although I am not
familiar with robustness on D3D12).
Currently this only implements buffer ranges for vertex buffers,
although indices can also be affected. A KHR_robustness profile is not
created, but Nvidia's driver reads out of bound vertex data as zero
anyway, this might have to be changed in the future.
- Fixes SMO random triangles when capturing an enemy, getting hit, or
looking at the environment on certain maps.
Make stream buffer and cached buffers as resident and query their
address. This allows us to use GPU addresses for several proprietary
Nvidia extensions.
Expose NV_vertex_buffer_unified_memory when the driver supports it.
This commit adds a function the determine if a GL_RENDERER is a Turing
GPU. This is required because on Turing GPUs Nvidia's driver crashes
when the buffer is marked as resident or on DeleteBuffers. Without a
synchronous debug output (single threaded driver), it's likely that
the driver will crash in the first blocking call.
Add HSET2_IMM. Due to the complexity of the encoding avoid using
BitField unions and read the relevant bits from the code itself.
This is less error prone.
Update validation layer string to VK_LAYER_KHRONOS_validation.
While we are at it, properly check for available validation layers
before enabling them.