Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page handle the scheduling policy of processes between the different CPUs available on a SEAPATH hypervisor

SEAPATH default CPU isolation

SEAPATH aims to host virtual machine with real time needs. To achieve that, process scheduling must be tuned in order to offer the best performance to the VM.

...

Info
In the Ansible inventory of the hypervisors, these CPUs are defined by the `isolcpus` variables.

Tuned

The Debian version of SEAPATH uses tuned (https://github.com/redhat-performance/tuned)

...

On Yocto, tuned is not used. Instead, all these configurations are done at compile time.

Scheduling virtual machines

SEAPATH virtual machines are managed by Qemu.

...

By default, all these threads will be managed by the Linux scheduler and thus run on the non isolated cores. But they can also be pinned to specific CPUs, what forced them to run on it.

Standard virtual machines

For a VM without any performance or real time needs, it is no use to handle any of the Qemu threads a particular way :

  • All threads will inherit a default priority and scheduling type (TS 19)

  • All threads will be handled by the Linux scheduler on the non isolated cores

Real time virtual machines

For a VM where performance and determinism is needed, here are our recommendations :

...

The vCPU scheduler type as to be FIFO (FF). A Real Time priority of one is enough.

TODO : put the link to VM configuration wiki page once writtenFor more information read page Virtual machines on SEAPATH.

Finer control with cgroup (optional)

Implementation in SEAPATH

The Linux kernel uses cgroups in order to isolate processes. These cgroups work in a hierarchy where each layer restrains the resources a process can access too. Systemd also uses this mechanism by grouping his processes in slices.

...

TODO : put the link to the inventories README once written

Utility of slices CPU isolation

Using these slices is useful to get a preset of CPU isolation for virtual machines. When placing a VM in either machine-rt or machine-nort slice it will be automatically scheduled on the CPUs of this slice.
It seems particularly useful when deploying many VMs at once.

...

Info
This new isolation layer protects from really advanced attacks. Because it has drawbacks (see below), the question remains open if you should or not activate this feature.

Drawbacks

By activating CPU isolation on the machine slice, the management threads of the VM will be scheduled on the allowed CPU list of the slice. This new mechanism implies two things :

...

Info
The management thread scheduling is handled by the `emulatorpin` field in libivrt XML. 

TODO : put the link to VM configuration wiki page once writtenFor more information, read page Virtual machines on SEAPATH.

Specific configurations

NUMA

NUMA (Non-Uniform Memory Access) refers to machines that have the ability to contain several CPU sockets. Each of these sockets has its own cache memory, which means that accessing memory from one socket to another is much slower than accessing memory on its own socket.

...

If your system contains more than one NUMA cells, you must be careful to pin all the vCPU threads of one VM on the same NUMA cell. Otherwise, the data transfer between two cells will significantly slow down the VM.

Hyper-threading

Most of the modern CPUs support hyper-threading. This option can be enabled in the BIOS and double the number of CPUs available on the system. However, the newly created CPUs are not as fast and independent as classic ones.

...

Info
On most systems, logical CPUs are grouped in numerical order (0 with 1, 2 with 3 …) but this is not always the case. Always refer to `virsh capabilities` to check the exact architecture.

Annex: list of tuned modifications

Below a list of all scheduling modifications done by tuned.

...

  • /sys/module/kvm/parameters/halt_poll_ns = 0
    /sys/kernel/ktimer_lockless_check = 1
    /sys/kernel/mm/ksm/run = 2

  • Kernel parameters :
    isolcpus=managed_irq,domain,{isolated_cores}
    intel_pstate=disable
    nosoftlockup
    tsc=reliable
    nohz=on
    nohz_full={isolated_cores}
    rcu_nocbs={isolated_cores}
    irqaffinity={non_isolated_cores}
    processor.max_cstate=1
    intel_idle.max_cstate=1
    cpufreq.default_governor=performance
    rcu_nocb_poll

  • kernel thread priorities :
    group.ksoftirqd=0:f:2:*:^\[ksoftirqd
    group.ktimers=0:f:2:*:^\[ktimers
    group.rcuc=0:f:4:*:^\[rcuc
    group.rcub=0:f:4:*:^\[rcub
    group.ktimersoftd=0:f:3:*:^\[ktimersoftd

  • configures irqbalance with isolated_cores list

  • configures workqueue with isolated_cores list

  • kernel.hung_task_timeout_secs = 600
    kernel.nmi_watchdog = 0
    kernel.sched_rt_runtime_us = -1
    vm.stat_interval = 10
    kernel.timer_migration = 0

VM configuration

The official documentation on the XML format of libvirt is here.

Resources

On the XML configuration of a virtual machine, the resource can be specified to know which slice should be used (more details here). So, the virtual machine will only have acces to the CPU associated with the slice.

Possible values:

  • /machine/nort
  • /machine/rt

Example, for a virtual machine with the real-time:

Code Block
languagexml
<resource>
	<partition>/machine/rt</partition>
</resource>

CPU tunning

In the project, this element will be used to limite the virtual machine (more details here).

  • The emulatorpin element specifies which of host physical CPUs the emulator, a subset of a domain not including vCPU or iothreads will be pinned to.
  • The vcpupin element specifies which of host's physical CPUs the domain vCPU will be pinned to. It's used to reserved one or more CPUs for a critical virtual machine. So, it's important not use this CPU on another VM.
  • The vcpusched element specifies the scheduler type for a particular vCPU. A priority can be setting. In the project, all values greats than 10, it's for the host; equals to 10, it's for the RCU and less than 10, it's to set the priority of the RT vCPU among themselves.