Many FreeBSD users (e.g. server admins) are interested in having their binaries run as fast as possible. There many options of improving the speed of the binaries – we can use different compilers and for each compiler different optimizations. But what combination is best for which processor?
We have benchmarked the perl binary compiled with gcc from the FreeBSD base system against gcc from ports and the new clang compiler. We have tested different optimizations on 8 different processors, all on the amd64 platform. The benchmark software we used is the perlbench benchmark running on Perl 5.12.3 on top of FreeBSD 8.2. This benchmark can also be used as a reference for users using other scripting languages (e.g. PHP, Python or Ruby) as these use similiar structures and methods.
We are benchmarking speed of the generated binaries, not the speed of compiling, as this is most important for us.
“Compile once, run many.”
Benchmarked compilers:
- gcc 4.2.1 from FreeBSD Base
- gcc 4.5 (or 4.6 for corei7) from the FreeBSD ports tree
- llvm/clang rev. 127334 from the FreeBSD ports tree
Tested optimization flags (depending on processor type):
none, -march=atom, -march=nocona, -march=core2, -march=corei7, -march=opteron-sse3, -march=barcelona
How do the general results look like?
First, as of this benchmark, we can say the following in general:
- clang was 10% slower in average on most of the tested CPUs than FreeBSD base gcc (4.2.1)
- gcc 4.5 was 5-10% faster in average on most of the tested CPUs
The test results are relative and the base 100 for this test is gcc 4.2.1 with unset CPUTYPE.
The following table summarizes the processors tested (click on processors for individual scores).
CPU | Family | Rec. system gcc | Rec. ports compiler |
---|---|---|---|
Intel Atom D525 | atom | CPUTYPE=core2 (*) | gcc45 -march=atom |
Intel Xeon 3065 | core2 | CPUTYPE=core2 (*) | gcc45 |
Intel Xeon E5310 | core2 | CPUTYPE=core2 (*) | gcc45 -march=core2 |
Intel Xeon E5405 | core2 | no CPUTYPE | gcc45 -march=core2 |
Intel Core i7-920 | nehalem | CPUTYPE=nocona | gcc45 -march=nocona |
Intel Xeon X3450 | nehalem | CPUTYPE=nocona | gcc45 -march=nocona |
Intel Xeon E5620 | nehalem | CPUTYPE=nocona | gcc45 -march=nocona |
AMD Opteron 6128 | barcelona | CPUTYPE=opteron-ssse3 | gcc45 -march=barcelona |
(*) with SSSE3 patch
Did we see any surprises? Yes, here they are:
- Core i7 based procesors run slower with -march=core2 (new option) on the system compiler than with -march=nocona
- For Core i7, the new optimization -march=corei7 from gcc 4.6 is still slower on average than -march=nocona
- On the Intel Atom -march=nocona hurts performance in many tests with both base gcc and ports gcc
- New AMD processors perform best with -march=opteron-sse3 if using the base compiler (otherwise -march=barcelona)
The full benchmark results are available at this URL:
http://www.vx.sk/benchmarks/perlbench/20110311