Inlines implementation of exclusive instructions into JITted code,
improving performance of applications relying heavily on these
instructions.
We also fastmem these instructions for additional speed, with
support for appropriate recompilation on fastmem failure.
An unsafe optimization to disable the intercore global_monitor is also
provided, should one wish to rely solely on cmpxchg semantics for
safety.
See also: merryhime/dynarmic#664
Decouples the CPU debugging mode from the enumeration to its own
boolean. After this, it moves the CPU Debugging tab over to a sub tab
underneath the Debug tab in the configuration UI.
The current CPU accuracy settings in yuzu are fairly polarized and
require more than common knowledge to know what the optimal settings for
yuzu would be. This adds a curated option called 'Auto' that applies a
few at the moment known-good unsafe optimizations to Dynarmic.
Removes common_sizes.h in favor of having `_KiB`, `_MiB`, `_GiB`, etc
user-literals within literals.h.
To keep the global namespace clean, users will have to use:
```
using namespace Common::Literals;
```
to access these literals.
We just create one memory subsystem. This is a constant all the time.
So there is no need to call the non-inlined parent.Memory() helper on every callback.
This code was used to switch the CPU ID on thread switches.
However since "hle: kernel: multicore: Replace n-JITs impl. with 4 JITs.", the CPU ID is not a constant.
This has been dead code since this rewrite, and dropped in dynarmic as well. So there is no need to keep it.
Now that we have most of core free of shadowing, we can enable the
warning as an error to catch anything that may be remaining and also
eliminate this class of logic bug entirely.
Resolves shadowing warnings that aren't in a particularly large
subsection of core. Brings us closer to turning -Wshadow into an error.
All that remains now is for cases in the kernel (left untouched for now
since a big change by bunnei is pending), and a few left over in the
service code (will be tackled next).
Squash attributes into the pointer's integer, making them an uintptr_t
pair containing 2 bits at the bottom and then the pointer. These bits
are currently unused thanks to alignment requirements.
Configure Dynarmic to mask out these bits on pointer reads.
While we are at it, remove some unused attributes carried over from
Citra.
Read/Write and other hot functions use a two step unpacking process that
is less readable to stop MSVC from emitting an extra AND instruction in
the hot path:
mov rdi,rcx
shr rdx,0Ch
mov r8,qword ptr [rax+8]
mov rax,qword ptr [r8+rdx*8]
mov rdx,rax
-and al,3
and rdx,0FFFFFFFFFFFFFFFCh
je Core::Memory::Memory::Impl::Read<unsigned char>
mov rax,qword ptr [vaddr]
movzx eax,byte ptr [rdx+rax]
Removes all remaining usages of the global system instance. After this,
migration can begin to migrate to being constructed and managed entirely
by the various frontends.
Unicorn long-since lost most of its use, due to dynarmic gaining support
for handling most instructions. At this point any further issues
encountered should be used to make dynarmic better.
This also allows us to remove our dependency on Python.
Allows some implementations to avoid completely zeroing out the internal
buffer of the optional, and instead only set the validity byte within
the structure.
This also makes it consistent how we return empty optionals.
This commit: Implements CPU Interrupts, Replaces Cycle Timing for Host
Timing, Reworks the Kernel's Scheduler, Introduce Idle State and
Suspended State, Recreates the bootmanager, Initializes Multicore
system.
We can also allow unicorn to be constructed in 32-bit mode or 64-bit
mode to satisfy the need for both interpreter instances.
Allows this code to compile successfully of non x86-64 architectures.
The Write functions are used slightly less than the Read functions,
which make these a bit nicer to move over.
The only adjustments we really need to make here are to Dynarmic's
exclusive monitor instance. We need to keep a reference to the currently
active memory instance to perform exclusive read/write operations.
With all of the trivial parts of the memory interface moved over, we can
get right into moving over the bits that are used.
Note that this does require the use of GetInstance from the global
system instance to be used within hle_ipc.cpp and the gdbstub. This is
fine for the time being, as they both already rely on the global system
instance in other functions. These will be removed in a change directed
at both of these respectively.
For now, it's sufficient, as it still accomplishes the goal of
de-globalizing the memory code.
Amends a few interfaces to be able to handle the migration over to the
new Memory class by passing the class by reference as a function
parameter where necessary.
Notably, within the filesystem services, this eliminates two ReadBlock()
calls by using the helper functions of HLERequestContext to do that for
us.
This was initially necessary when AArch64 JIT emulation was in its
infancy and all memory-related instructions weren't implemented.
Given the JIT now has all of these facilities implemented, we can remove
these functions from the CPU interface.
Our initialization process is a little wonky than one would expect when
it comes to code flow. We initialize the CPU last, as opposed to
hardware, where the CPU obviously needs to be first, otherwise nothing
else would work, and we have code that adds checks to get around this.
For example, in the page table setting code, we check to see if the
system is turned on before we even notify the CPU instances of a page
table switch. This results in dead code (at the moment), because the
only time a page table switch will occur is when the system is *not*
running, preventing the emulated CPU instances from being notified of a
page table switch in a convenient manner (technically the code path
could be taken, but we don't emulate the process creation svc handlers
yet).
This moves the threads creation into its own member function of the core
manager and restores a little order (and predictability) to our
initialization process.
Previously, in the multi-threaded cases, we'd kick off several threads
before even the main kernel process was created and ready to execute (gross!).
Now the initialization process is like so:
Initialization:
1. Timers
2. CPU
3. Kernel
4. Filesystem stuff (kind of gross, but can be amended trivially)
5. Applet stuff (ditto in terms of being kind of gross)
6. Main process (will be moved into the loading step in a following
change)
7. Telemetry (this should be initialized last in the future).
8. Services (4 and 5 should ideally be alongside this).
9. GDB (gross. Uses namespace scope state. Needs to be refactored into a
class or booted altogether).
10. Renderer
11. GPU (will also have its threads created in a separate step in a
following change).
Which... isn't *ideal* per-se, however getting rid of the wonky
intertwining of CPU state initialization out of this mix gets rid of
most of the footguns when it comes to our initialization process.
Adjusts the interface of the wrappers to take a system reference, which
allows accessing a system instance without using the global accessors.
This also allows getting rid of all global accessors within the
supervisor call handling code. While this does make the wrappers
themselves slightly more noisy, this will be further cleaned up in a
follow-up. This eliminates the global system accessors in the current
code while preserving the existing interface.