It’s time for another small blog about the ODROID-XU4.
This is just a quick tip to improve your network and USB performance even more. It will optimize your hardware interrupts (IRQ) affinity on your ODROID-XU4.
This guide is for the 3.10.y kernel and debian 8. For other kernel versions the interrupts may have different numbers.
Description
Whenever a piece of hardware, such as disk controller or ethernet card, needs attention from the CPU, it throws an interrupt. The interrupt tells the CPU that something has happened and that the CPU should drop what it’s doing to handle the event. In order to prevent multiple devices from sending the same interrupts, the IRQ system was established where each device in a computer system is assigned its own special IRQ so that its interrupts are unique.Starting with the 2.4 kernel, Linux has gained the ability to assign certain IRQs to specific processors (or groups of processors). This is known as SMP IRQ affinity, and it allows you control how your system will respond to various hardware events. It allows you to restrict or repartition the workload that you server must do so that it can more efficiently do it’s job.
Source
It’s always a good idea to spread your interrupts evenly across all CPUs. In my case I want to achieve the best performance possible. Therefore I want to use the faster A15 CPU cluster for all important interrupt handling.
There are basically 3 different interrupts on a headless ODROID-XU4 server you should take into consideration:
- the USB2 port
- the first USB3 port
- the second USB3 port (the 1 Gigabit ethernet adapter is connected to this one)
Per default all 3 interrupts for these devices are handled by CPU0, which is a A7 core as you can see in the output below:
lscpu -e CPU SOCKET CORE ONLINE MAXMHZ MINMHZ 0 0 0 yes 1400.0000 200.0000 1 0 1 yes 1400.0000 200.0000 2 0 2 yes 1400.0000 200.0000 3 0 3 yes 1400.0000 200.0000 4 1 4 yes 2000.0000 200.0000 5 1 5 yes 2000.0000 200.0000 6 1 6 yes 2000.0000 200.0000 7 1 7 yes 2000.0000 200.0000 grep -E 'CPU0|usb' /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 103: 1 0 0 0 0 0 0 0 GIC ehci_hcd:usb1, ohci_hcd:usb2 104: 12853 0 0 0 0 0 0 0 GIC xhci-hcd:usb3 105: 7489 0 0 0 0 0 0 0 GIC xhci-hcd:usb5
IRQ Tuning
First of all make sure that automatic IRQ balancing is disabled:
systemctl disable irqbalance
For debian add the following to your /etc/rc.local file to pin the interrupt handling to A15 cores 4-6 (CPU4-6):
# Move USB and network irqs to A15 CPU cluster # usb2 echo 6 > /proc/irq/103/smp_affinity_list # usb3 echo 5 > /proc/irq/104/smp_affinity_list # network (usb3) echo 4 > /proc/irq/105/smp_affinity_list
After a reboot and some file transfer you should see something like this:
grep -E 'CPU0|usb' /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 103: 1 0 0 0 0 0 0 0 GIC ehci_hcd:usb1, ohci_hcd:usb2 104: 8355 0 0 0 0 249689 0 0 GIC xhci-hcd:usb3 105: 436 0 0 0 4396187 0 0 0 GIC xhci-hcd:usb5
Note the numbers for CPU4 and CPU5. CPU0 handled some initial interrupts during the boot, because rc.local isn’t executed immediately.
Benchmarks
Tuning without measuring performance before and afterwards is useless. So, here are some iperf results:
# without irq tuning iperf -c 192.168.0.2 -i 2 -r ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to 192.168.0.2, TCP port 5001 TCP window size: 272 KByte (default) ------------------------------------------------------------ [ 5] local 192.168.0.121 port 57696 connected with 192.168.0.2 port 5001 [ ID] Interval Transfer Bandwidth [ 5] 0.0- 2.0 sec 198 MBytes 830 Mbits/sec [ 5] 2.0- 4.0 sec 198 MBytes 830 Mbits/sec [ 5] 4.0- 6.0 sec 201 MBytes 842 Mbits/sec [ 5] 6.0- 8.0 sec 199 MBytes 835 Mbits/sec [ 5] 8.0-10.0 sec 199 MBytes 835 Mbits/sec [ 5] 0.0-10.0 sec 995 MBytes 834 Mbits/sec [ 4] local 192.168.0.121 port 5001 connected with 192.168.0.2 port 41073 [ 4] 0.0- 2.0 sec 206 MBytes 865 Mbits/sec [ 4] 2.0- 4.0 sec 207 MBytes 870 Mbits/sec [ 4] 4.0- 6.0 sec 210 MBytes 881 Mbits/sec [ 4] 6.0- 8.0 sec 211 MBytes 883 Mbits/sec [ 4] 8.0-10.0 sec 210 MBytes 882 Mbits/sec [ 4] 0.0-10.0 sec 1.02 GBytes 876 Mbits/sec # with irq tuning iperf -c 192.168.0.2 -i 2 -r ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to 192.168.0.2, TCP port 5001 TCP window size: 289 KByte (default) ------------------------------------------------------------ [ 5] local 192.168.0.121 port 57702 connected with 192.168.0.2 port 5001 [ ID] Interval Transfer Bandwidth [ 5] 0.0- 2.0 sec 224 MBytes 941 Mbits/sec [ 5] 2.0- 4.0 sec 223 MBytes 936 Mbits/sec [ 5] 4.0- 6.0 sec 223 MBytes 935 Mbits/sec [ 5] 6.0- 8.0 sec 223 MBytes 937 Mbits/sec [ 5] 8.0-10.0 sec 223 MBytes 934 Mbits/sec [ 5] 0.0-10.0 sec 1.09 GBytes 936 Mbits/sec [ 4] local 192.168.0.121 port 5001 connected with 192.168.0.2 port 41076 [ 4] 0.0- 2.0 sec 219 MBytes 920 Mbits/sec [ 4] 2.0- 4.0 sec 220 MBytes 924 Mbits/sec [ 4] 4.0- 6.0 sec 220 MBytes 924 Mbits/sec [ 4] 6.0- 8.0 sec 220 MBytes 924 Mbits/sec [ 4] 8.0-10.0 sec 220 MBytes 924 Mbits/sec [ 4] 0.0-10.0 sec 1.08 GBytes 923 Mbits/sec
Up to 100 Mbit/s faster. Not bad for such an easy fix 🙂
Read my post in the ODROID forum to get some more information and tuning tips.