Direct IO
Emulated IO path
When all IO is emulated by qemu, this is the network packet receive path:
- Physical NIC DMAs packet into host kernel memory and triggers interrupt.
- Host kernel interrupt handler executes.
- Packet traverses host network stack and is routed to the host qemu process.
- qemu process is marked runnable and notified of new data via poll(3) or select(3).
- When qemu is scheduled, it enqueues the packet with the virtual NIC and triggers an interrupt to the guest.
- The guest interrupt handler executes, performing some IO instructions to access the virtual NIC.
- Each IO access is emulated by the host qemu.
- The packet traverses the guest network stack and is routed to the appropriate guest process.
- The guest process is scheduled and reads the data.
Shortening the path
If the guest were able to directly access the physical NIC, much of this path could be avoided, improving latency. When the guest attempts an IO access, control is returned to KVM, which today sends the access to qemu for emulation. However, KVM could perform the IO on the guest's behalf, in effect proxying guest IO operations.
Allowing direct access like this without an IOMMU will sacrifice isolation (reliability, security) to benefit IO performance. This may be inappropriate for general purpose workloads, but still valuable for others.
Interrupts
When the guest attempts to enable an interrupt in the virtual PIC, the host could register that IRQ with a request_irq() and supply some IRQ proxying code. However, this would leave the interrupt active until the guest device driver can execute to acknowledge it. That would be disastrous in the case of shared interrupt lines.
DMA problem
However, an unmodified guest device driver is unaware of its true address space, so it would program incorrect DMA addresses into the device. Somebody needs to perform address translation. Possible solutions:
- Put enough device emulation into the host kernel proxying so that it can translate addresses given to the device.
- Require slightly modified guest device drivers, which request address translation from the host.
- Require brand new virtual IO guest device drivers.
- Place guest in memory such that guest physical addresses == host physical addresses.
- Note: while guest kernels may require memory at gpa 0, it may be possible to present a sparse guest physical memory map, e.g. place 1MB at 0, and the rest of memory starting at 2GB.
- This only works when guest memory is physically contiguous, which is not true today, and will be less true when memory is allocated in userspace. Still may be possible in situations with dedicated memory.