Linux Performance Improvements
This page documents progress on a project to improve Asio performance on Linux 2.6.x.
Goal
The goal of this project is to improve the performance of Asio on Linux, in both single-threaded and multi-threaded uses.
How to participate
If you care about Asio's performance on Linux, then please consider getting involved in this project. There are two ways you can help:
- Supply performance numbers. For the work to be effective, benchmarks are needed from real applications. (No source code required - just the numbers please!)
- Test the changes to ensure they don't introduce bugs.
To start, you need to:
- Check out the baseline version of Asio.
- Benchmark your application.
- Publish your numbers (with a short description) below under Baseline.
- Consider adding a description of your test setup at the bottom of the page.
And then, as new versions are made available:
- Check out new tag.
- Benchmark again.
- Publish numbers under the tag heading below.
- Report any bugs.
How to get a version of Asio for testing
Work will be done on a branch called
linux-perf-branch in Asio's
CVS repository. To get a particular version, check out (or update to) the tag as specified. If you want to use Boost.Asio rather than Asio, you will need to run the
boostify.pl script in Asio's root directory and copy the content of the
boostified directory to your Boost distribution.
Progress
Baseline
CVS Tag: linux-perf-branch-start
CK Echo Test 1: 165 MB/s (higher is better)
CK Echo Test 2: 330 MB/s
CK Echo Test 3: 263 MB/s
CK Echo Test 4: 242 MB/s
CK Echo Test 5: 222 MB/s
CK HTTP Test 1: 9460 req/s (higher is better)
Linux-Perf-1
Posts completion handlers generated by the reactor task in a batch, to reduce locking overhead.
CVS Tag: linux-perf-1
CK Echo Test 1: 172 MB/s
CK Echo Test 2: 335 MB/s
CK Echo Test 3: 265 MB/s
CK Echo Test 4: 247 MB/s
CK Echo Test 5: 225 MB/s
CK HTTP Test 1: 9340 req/s
Linux-Perf-2
Uses an edge-triggered (rather than level-triggered) epoll reactor. However, the reactor no longer performs speculative reads and writes outside the reactor mutex.
CVS Tag: linux-perf-2
CK Echo Test 1: 174 MB/s
CK Echo Test 2: 349 MB/s
CK Echo Test 3: 292 MB/s
CK Echo Test 4: 257 MB/s
CK Echo Test 5: 232 MB/s
CK HTTP Test 1: 9100 req/s
Linux-Perf-3
Eliminates signal blocking while reactor operations are performed.
CVS Tag: linux-perf-3
CK Echo Test 1: 177 MB/s
CK Echo Test 2: 352 MB/s
CK Echo Test 3: 293 MB/s
CK Echo Test 4: 256 MB/s
CK Echo Test 5: 232 MB/s
CK HTTP Test 1: 9040 req/s
Linux-Perf-4
Use per-descriptor operation queues. Use an edge-triggered strategy for interrupting the reactor. Don't explicitly delete descriptor from epoll. Re-enable null_buffers support.
CVS Tag: linux-perf-4
CK Echo Test 1: 177 MB/s
CK Echo Test 2: 343 MB/s
CK Echo Test 3: 291 MB/s
CK Echo Test 4: 259 MB/s
CK Echo Test 5: 233 MB/s
CK HTTP Test 1: 10040 req/s
Linux-Perf-5
Use per-descriptor mutexes.
CVS Tag: linux-perf-5
CK Echo Test 1: 177 MB/s
CK Echo Test 2: 352 MB/s
CK Echo Test 3: 282 MB/s
CK Echo Test 4: 259 MB/s
CK Echo Test 5: 233 MB/s
CK HTTP Test 1: 10006 req/s
Linux-Perf-6
Re-run the reactor immediately if there may be more events available (i.e. the events array was full) and there are other threads processing handlers.
CVS Tag: linux-perf-6
CK Echo Test 1: 177 MB/s
CK Echo Test 2: 354 MB/s
CK Echo Test 3: 290 MB/s
CK Echo Test 4: 257 MB/s
CK Echo Test 5: 233 MB/s
CK HTTP Test 1: 9920 req/s
Linux-Perf-7
Fix a bug preventing null_buffers() support form working. Fix a problem where a task_io_service member variable was being incorrectly access outside the lock.
CVS Tag: linux-perf-7
Linux-Perf-8
Change strands so that they share a pool of implementations, to make copying and destruction of strand objects cheaper.
CVS Tag: linux-perf-8
Linux-Perf-9
Use a thread-private handler queue inside run() for running a small number of additional handlers outside of the common handler queue.
CVS Tag: linux-perf-9
Linux-Perf-10
Add support for using timerfd to manage timeouts for timer operations. Ensure items in the thread-private handler queue are moved to the common queue when an exception is thrown.
CVS Tag: linux-perf-10
Linux-Perf-11
Where possible, run the epoll reactor from multiple threads.
CVS Tag: linux-perf-11
CK Echo CPU Scalability:
Test setups
CK Echo Test 1
This test uses the src/tests/performance programs included with non-Boost Asio. It measure total throughput across all sockets in MB/s. Hardware is running two Intel Xeon E5310 quad core processors (1.6GHz), 6GB RAM, 64-bit Debian Linux.
Asio is configured using:
CXXFLAGS="-O2 -finline-limit=1000"
Server program is run using:
taskset -c 0 ./server 0.0.0.0 55555 1 16384
Client program is run using:
taskset -c 1 ./client localhost 55555 1 16384 1 100
CK Echo Test 2
Configuration as for CK Echo Test 1.
Server program is run using:
taskset -c 0 ./server 0.0.0.0 55555 1 16384
Client program is run using:
taskset -c 1 ./client localhost 55555 1 16384 10 100
CK Echo Test 3
Configuration as for CK Echo Test 1.
Server program is run using:
taskset -c 0 ./server 0.0.0.0 55555 1 16384
Client program is run using:
taskset -c 1 ./client localhost 55555 1 16384 100 100
CK Echo Test 4
Configuration as for CK Echo Test 1.
Server program is run using:
taskset -c 0 ./server 0.0.0.0 55555 1 16384
Client program is run using:
taskset -c 1 ./client localhost 55555 1 16384 1000 100
CK Echo Test 5
Configuration as for CK Echo Test 1.
Server program is run using:
taskset -c 0 ./server 0.0.0.0 55555 1 16384
Client program is run using:
taskset -c 1 ./client localhost 55555 1 16384 10000 100
CK HTTP Test 1
This test uses HTTP Server Example 1 from Asio with
ab to measure requests per second. Hardware configuration as for CK Echo Test 1.
Server program is run using:
taskset -c 0 ./http_server 0.0.0.0 8090 ../doc_root
Client program is run using:
taskset -c 1 ab -c 100 -n 1000000 'http://127.0.0.1:8090/data_4K.html'
CK Echo CPU Scalability
This test compares the throughput of a single io_service running on N CPUs (one thread per CPU) against N io_services each running on one CPU. It uses the src/tests/performance programs included with non-Boost Asio. Hardware configuration as for CK Echo Test 1.
RtB? Echo Tests
Like the CK Echo tests, except that they are run on two E5420 Xeon quad core processors @2.5 GHz, 16 GB RAM. Debian/GNU Linux, GCC 4.3.2. Programs are run using a bash script shown below.
#!/bin/bash
killall server
timeout=100
for bufsize in 16384 32768 65536 do
for nothreads in 1 2 4 do
for nosessions in 1 10 100 do
echo "Bufsize: $bufsize Threads: $nothreads Sessions: $nosessions"
./server 0.0.0.0 55555 $nothreads $bufsize & srvpid=$!
./client localhost 55555 $nothreads $bufsize $nosessions $timeout
kill -9 $srvpid
done
done
done
| Buffer |
Threads |
Sessions |
Baseline |
perf-1 |
perf-2 |
perf-3 |
perf-4 |
perf-5 |
perf-6 |
perf-7 |
perf-8 |
perf-10 |
perf-11 |
| 16384 |
1 |
1 |
344 |
348 |
272 |
364 |
280 |
385 |
370 |
299 |
485 |
468 |
321 |
| 16384 |
1 |
10 |
686 |
549 |
568 |
734 |
581 |
740 |
745 |
742 |
792 |
817 |
616 |
| 16384 |
1 |
100 |
545 |
495 |
526 |
573 |
568 |
577 |
568 |
625 |
611 |
617 |
571 |
| 16384 |
2 |
1 |
251 |
231 |
248 |
266 |
256 |
254 |
266 |
253 |
275 |
276 |
259 |
| 16384 |
2 |
10 |
671 |
615 |
439 |
423 |
450 |
675 |
662 |
687 |
808 |
677 |
754 |
| 16384 |
2 |
100 |
578 |
577 |
400 |
437 |
411 |
596 |
608 |
620 |
647 |
656 |
649 |
| 16384 |
4 |
1 |
229 |
248 |
239 |
242 |
242 |
232 |
233 |
233 |
244 |
271 |
263 |
| 16384 |
4 |
10 |
444 |
515 |
369 |
369 |
370 |
567 |
594 |
594 |
580 |
627 |
623 |
| 16384 |
4 |
100 |
543 |
589 |
382 |
389 |
421 |
656 |
656 |
655 |
659 |
681 |
571 |
| 32678 |
1 |
1 |
366 |
450 |
375 |
378 |
678 |
676 |
n/a (?) |
666 |
387 |
388 |
522 |
| 32678 |
1 |
10 |
701 |
919 |
940 |
947 |
953 |
950 |
719 |
951 |
754 |
745 |
985 |
| 32678 |
1 |
100 |
590 |
542 |
553 |
555 |
558 |
556 |
614 |
556 |
640 |
641 |
596 |
| 32678 |
2 |
1 |
373 |
363 |
375 |
380 |
368 |
379 |
355 |
382 |
396 |
395 |
388 |
| 32678 |
2 |
10 |
638 |
722 |
639 |
590 |
611 |
752 |
803 |
864 |
734 |
784 |
794 |
| 32678 |
2 |
100 |
600 |
610 |
518 |
478 |
510 |
668 |
629 |
645 |
647 |
700 |
742 |
| 32678 |
4 |
1 |
326 |
347 |
316 |
350 |
340 |
318 |
322 |
331 |
337 |
417 |
411 |
| 32678 |
4 |
10 |
588 |
658 |
502 |
488 |
502 |
677 |
686 |
685 |
678 |
703 |
700 |
| 32678 |
4 |
100 |
601 |
618 |
491 |
490 |
507 |
664 |
663 |
659 |
664 |
660 |
597 |
| 65536 |
1 |
1 |
1091 |
1108 |
1123 |
539 |
733 |
538 |
1172 |
536 |
553 |
560 |
748 |
| 65536 |
1 |
10 |
1062 |
1056 |
1082 |
834 |
1089 |
1092 |
1078 |
1097 |
810 |
833 |
867 |
| 65536 |
1 |
100 |
484 |
480 |
475 |
552 |
476 |
478 |
480 |
479 |
571 |
508 |
791 |
| 65536 |
2 |
1 |
417 |
415 |
422 |
484 |
503 |
511 |
704 |
428 |
486 |
716 |
511 |
| 65536 |
2 |
10 |
839 |
863 |
678 |
674 |
676 |
843 |
799 |
856 |
820 |
844 |
854 |
| 65536 |
2 |
100 |
615 |
689 |
516 |
517 |
558 |
657 |
594 |
622 |
618 |
640 |
708 |
| 65536 |
4 |
1 |
422 |
424 |
402 |
408 |
406 |
403 |
402 |
399 |
430 |
472 |
|
| 65536 |
4 |
10 |
763 |
755 |
609 |
612 |
620 |
747 |
755 |
747 |
747 |
775 |
771 |
| 65536 |
4 |
100 |
663 |
683 |
534 |
515 |
523 |
674 |
670 |
671 |
662 |
668 |
681 |
Another test is run on an Intel Core2 Duo T9400 @2.53Ghz.
| Buffer |
Threads |
Sessions |
Baseline |
perf-1 |
perf-2 |
perf-3 |
perf-4 |
perf-5 |
perf-6 |
perf-7 |
perf-8 |
perf-11 |
| 16384 |
1 |
1 |
330 |
343 |
345 |
348 |
353 |
|
355 |
354 |
367 |
383 |
| 16384 |
1 |
10 |
870 |
895 |
938 |
930 |
953 |
956 |
966 |
963 |
1110 |
1093 |
| 32768 |
1 |
1 |
507 |
517 |
520 |
522 |
|
527 |
534 |
|
|
|
| 32768 |
1 |
10 |
1257 |
1274 |
1326 |
1322 |
1321 |
1347 |
1359 |
1355 |
1476 |
|
| 65536 |
1 |
1 |
|
984 |
983 |
898 |
908 |
918 |
918 |
918 |
|
923 |
| 65536 |
1 |
10 |
1652 |
|
1704 |
1703 |
1691 |
1728 |
1710 |
1728 |
1838 |
1845 |
| 131072 |
1 |
1 |
|
|
|
1176 |
1171 |
1172 |
|
|
1283 |
1357 |
| 131072 |
1 |
10 |
1200 |
1343 |
|
|
|
1326 |
1366 |
1370 |
1356 |
1173 |
Date: 18 november 2009
System: Linux 2.6.31-ARCH #1 SMP PREEMPT Tue Nov 10 19:01:40 CET 2009 x86_64 Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz
GenuineIntel? GNU/Linux
Compiler: GCC 4.4.2
| Buffer |
Threads |
Sessions |
baseline |
perf-1 |
perf-2 |
perf-3 |
perf-4 |
perf-5 |
perf-6 |
perf-7 |
perf-8 |
perf-9 |
perf-10 |
perf-11 |
| 16384 |
1 |
1 |
490 |
513 |
527 |
536 |
0 |
554 |
546 |
553 |
0 |
0 |
0 |
592 |
| 16384 |
1 |
10 |
922 |
952 |
1016 |
1019 |
1038 |
1044 |
1042 |
1046 |
0 |
1167 |
1173 |
1180 |
| 16384 |
1 |
100 |
478 |
484 |
498 |
484 |
490 |
491 |
487 |
487 |
0 |
522 |
481 |
526 |
| 16384 |
2 |
1 |
413 |
451 |
452 |
483 |
493 |
482 |
498 |
492 |
459 |
475 |
528 |
588 |
| 16384 |
2 |
10 |
832 |
920 |
985 |
1001 |
1013 |
1022 |
1019 |
1003 |
1152 |
1137 |
1133 |
1000 |
| 16384 |
2 |
100 |
430 |
461 |
497 |
487 |
485 |
455 |
460 |
452 |
515 |
540 |
553 |
645 |
| 16384 |
4 |
1 |
445 |
441 |
0 |
394 |
456 |
385 |
443 |
397 |
492 |
536 |
530 |
575 |
| 16384 |
4 |
10 |
806 |
902 |
952 |
979 |
988 |
1009 |
996 |
1010 |
0 |
1125 |
1137 |
1031 |
| 16384 |
4 |
100 |
400 |
476 |
497 |
496 |
486 |
479 |
479 |
479 |
516 |
525 |
497 |
661 |
| 32768 |
1 |
1 |
684 |
704 |
711 |
741 |
752 |
743 |
748 |
749 |
0 |
779 |
780 |
793 |
| 32768 |
1 |
10 |
1317 |
1332 |
1406 |
1424 |
1441 |
1435 |
1444 |
1418 |
1574 |
1546 |
1559 |
1518 |
| 32768 |
1 |
100 |
473 |
470 |
475 |
471 |
496 |
464 |
492 |
489 |
499 |
516 |
466 |
532 |
| 32768 |
2 |
1 |
595 |
661 |
648 |
649 |
649 |
653 |
654 |
637 |
725 |
710 |
692 |
771 |
| 32768 |
2 |
10 |
1084 |
1309 |
1333 |
1396 |
1406 |
1418 |
1419 |
1278 |
1528 |
1520 |
1512 |
0 |
| 32768 |
2 |
100 |
455 |
468 |
473 |
472 |
485 |
562 |
548 |
505 |
560 |
684 |
495 |
0 |
| 32768 |
4 |
1 |
636 |
615 |
637 |
665 |
678 |
600 |
596 |
643 |
725 |
714 |
699 |
792 |
| 32768 |
4 |
10 |
989 |
1288 |
1352 |
1378 |
1365 |
1390 |
1389 |
1407 |
1526 |
1500 |
1516 |
1401 |
| 32768 |
4 |
100 |
449 |
468 |
474 |
474 |
483 |
476 |
480 |
504 |
543 |
470 |
452 |
699 |
| 65536 |
1 |
1 |
1110 |
1147 |
1156 |
1182 |
1205 |
1197 |
1184 |
1197 |
1287 |
1197 |
1277 |
1307 |
| 65536 |
1 |
10 |
1609 |
1635 |
1467 |
1684 |
1646 |
1689 |
1650 |
1727 |
1811 |
1721 |
1714 |
1643 |
| 65536 |
1 |
100 |
511 |
476 |
497 |
460 |
525 |
517 |
510 |
518 |
528 |
0 |
544 |
591 |
| 65536 |
2 |
1 |
1002 |
1021 |
1070 |
1113 |
1127 |
1122 |
1122 |
1122 |
1208 |
1178 |
1213 |
938 |
| 65536 |
2 |
10 |
1328 |
1660 |
1649 |
1710 |
1713 |
1702 |
1680 |
1712 |
1816 |
1670 |
1644 |
1699 |
| 65536 |
2 |
100 |
462 |
591 |
448 |
527 |
513 |
551 |
532 |
556 |
535 |
0 |
598 |
873 |
| 65536 |
4 |
1 |
1000 |
714 |
737 |
742 |
757 |
715 |
695 |
706 |
745 |
1102 |
0 |
793 |
| 65536 |
4 |
10 |
1304 |
1677 |
1688 |
1731 |
1724 |
1762 |
1738 |
1744 |
1847 |
1770 |
0 |
1689 |
| 65536 |
4 |
100 |
461 |
463 |
455 |
0 |
516 |
525 |
511 |
533 |
541 |
552 |
0 |
741 |
| http |
- |
- |
19063 |
18053 |
16867 |
17349 |
17611 |
18630 |
17496 |
17538 |
17781 |
17862 |
17775 |
17698 |