FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang

Many FreeBSD users (e.g. server admins) are interested in having their binaries run as fast as possible. There many options of improving the speed of the binaries – we can use different compilers and for each compiler different optimizations. But what combination is best for which processor?

We have benchmarked the perl binary compiled with gcc from the FreeBSD base system against gcc from ports and the new clang compiler. We have tested different optimizations on 8 different processors, all on the amd64 platform. The benchmark software we used is the perlbench benchmark running on Perl 5.12.3 on top of FreeBSD 8.2. This benchmark can also be used as a reference for users using other scripting languages (e.g. PHP, Python or Ruby) as these use similiar structures and methods.

We are benchmarking speed of the generated binaries, not the speed of compiling, as this is most important for us.
“Compile once, run many.”
Benchmarked compilers:

  1. gcc 4.2.1 from FreeBSD Base
  2. gcc 4.5 (or 4.6 for corei7) from the FreeBSD ports tree
  3. llvm/clang rev. 127334 from the FreeBSD ports tree

Tested optimization flags (depending on processor type):
none, -march=atom, -march=nocona, -march=core2, -march=corei7, -march=opteron-sse3, -march=barcelona

How do the general results look like?
First, as of this benchmark, we can say the following in general:

  • clang was 10% slower in average on most of the tested CPUs than FreeBSD base gcc (4.2.1)
  • gcc 4.5 was 5-10% faster in average on most of the tested CPUs

The test results are relative and the base 100 for this test is gcc 4.2.1 with unset CPUTYPE.
The following table summarizes the processors tested (click on processors for individual scores).

CPU Family Rec. system gcc Rec. ports compiler
Intel Atom D525 atom CPUTYPE=core2 (*) gcc45 -march=atom
Intel Xeon 3065 core2 CPUTYPE=core2 (*) gcc45
Intel Xeon E5310 core2 CPUTYPE=core2 (*) gcc45 -march=core2
Intel Xeon E5405 core2 no CPUTYPE gcc45 -march=core2
Intel Core i7-920 nehalem CPUTYPE=nocona gcc45 -march=nocona
Intel Xeon X3450 nehalem CPUTYPE=nocona gcc45 -march=nocona
Intel Xeon E5620 nehalem CPUTYPE=nocona gcc45 -march=nocona
AMD Opteron 6128 barcelona CPUTYPE=opteron-ssse3 gcc45 -march=barcelona

(*) with SSSE3 patch

Did we see any surprises? Yes, here they are:

  • Core i7 based procesors run slower with -march=core2 (new option) on the system compiler than with -march=nocona
  • For Core i7, the new optimization -march=corei7 from gcc 4.6 is still slower on average than -march=nocona
  • On the Intel Atom -march=nocona hurts performance in many tests with both base gcc and ports gcc
  • New AMD processors perform best with -march=opteron-sse3 if using the base compiler (otherwise -march=barcelona)

The full benchmark results are available at this URL:
http://www.vx.sk/benchmarks/perlbench/20110311

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>