Some notes taking from reading the paper ‘Xen and the art of virtualization’.
Xen is an x86 virtual machine monitor allowing multiple commodity OSs to share conventional hardware in a safe and resource managed fashion.
Virtual machine
goal: isolation, hosting services
design principles
- must be isolated from one another
- should support a variety of different OS
- the performance overhead should be small
VM vs process
- process can not achieve performance isolation
- why not process? - processes do not give you performance isolation
VM vs container
- both of them have performance isolation
- VM can run different OS
VM vs. exokernel
- VMs give illusion of having the entile physical machine -> do not need to modify os
- libos must expilicit request physical resourse -> need to modify os
Full virtualization vs paravirtualization
full virtualization - (VMware)
- guest OS shouldnt be modified
- OS -> hardware: software write registers (ex: page fault, disk access… VM context)
- VMM save VM context and swap
- VMM: kernel mode, guest OS: user mode
- CRUX: guest OS may have privelage instruction, but it run in user mode -> raise an exception -> handle by VMM
why difficult for x86
- in x86 some privlige insr do not trap -> binary rewriting
- x86 is difficult to virtulize memory
paravirtulization
- we can modify the OS slightly: benefit? drawness?
- Hypercalls: simillar to exokernel sys calls, key diff with full virtual. Ex: replace the problematic priviledge operation with a hypercall
design priciples of paravirtualization
- support for unmodified application binnaries
- support full multi-application OS
- should obtain high performance and strong resourse isolation
- completely hiding the effects of resource virtualization
Xen vs Denali
- Denali does not target existing ABIs -> does not fully support x86 segmentation
- Denali implementation does not address the problem of supporting app multiplexing / Xen hosts a real OS may securely multiplex itself
- Denali VMM performs all paging to and from disk / Xen each guest OS proform its own paging using own guaranteed mam reservation
- Denali virtualize ’namespace’ of machine resources / Xen has access control within
Virtualization in Xen
mem virtualize:
- unix: VPN->PPN
- full virtual: two stage mapping, VPN->PPN->hardware, shadow page table have pointer to hardware addr
- Xen:
- hypercall to change page table, mem manage by VMM (has direct read access)
- cannot install fully-privileged segement descriptors and cannot overlap with the top end of linear address space
- avoid TLB flush
cpu virtualize
- priviledge: guest OS must be modified to run at a lower provilege level, running at same level as apps, OS run in a seperate address space (for x86, Xen can be run in ring 1)
- exception handlers:
- hypercall to VMM telling where the exception hander located, and VMM tell hardware, so it could be executed directly via ring 0. (descriptor table for exception handlers)
- guest OS need to modify page fault handler because it would read from a priviledged register.
- system call - install a fast handler can be call diectly
- hypercall
- has both real and virtual time
domain 0
- a domain created at boot time and permitted to use the control interface
- privilge domain management, can set VMM parameters and control other domains
device virtualize:
- full: 2 stage
- patial: DMA, shared-memory,asynchronous I/O ring
control
- hypercall: domain -> Xen (sync)
- event: Xen -> domain (async)
- pending: stored in per-domain bitmask
multiple notion of time
- real time
- virtual time and BVT scheduling
- wall time
network
- packet filter(rule), domain 0 responsible for inserting and removing rules
- zero copy reception: by using descriptor rings, DMA
disk
- only domain0 has direct unckecked access to physical disks, others: VBD
- reorder requests whthin guest OS and within Xen
- Domains may pass down reorder barriers to prevent reordering
evaluation method
- compariison
- metrics: speed
- workload: benchmarks (macro and micro)
performance overheads
- slower in fork, exec, sh: require large numbers of page table updates, must all be verified by Xen
- context switching time: executes a hypercall to change page table
- page fault latency: Xen require two stage: 1. take hardware fault and pass details to guest OS. 2. install updated page table entry on guest OS
- high level of synchronous disk activity
Paravirtualization today
- OS modification -> hard to take off
- performance: not a big deal