The computation code is most likely limited by the divide and sqrt operations; the remainder of the code is ~24 addition/multiplication ops. It is also possible that memory lookup is the limiting factor; without extensive benchmarking using instruction counters, it is difficult to determine what is the bottleneck.
utah-g3d is available here; it is released under the GPL.
OS: Linux 2.4.7 / Debian RAM: 512MB Physical Benchmarks: utah-g3d "speed test" 58.12s
OS: Linux 2.4.7 SMP / Debian RAM: 512MB Physical Benchmarks: utah-g3d "speed test" 49.51s 1 of 2 CPUs used for benchmark utah-g3d compute code; 365625 blocks @ 14625 points (5.3469e9 block calculations) 1011.22s
OS: Linux 2.4.0-test9 / Debian RAM: 128MB Physical Benchmarks: utah-g3d "speed test" 95.97s
OS: Linux 2.4.0-test6 SMP - 1 CPU used RAM: 384MB Physical Benchmarks: utah-g3d "speed test" 227.97s
OS: Linux 2.2.16 SMP - 1 CPU used RAM: 512MB Physical Benchmarks: utah-g3d "speed test" 174.65s
OS: Solaris 2.7 RAM: 256 MB? Benchmarks: utah-g3d "speed test" 142.0s
OS: Linux 2.6.32 SMP - 1 of 4 cores RAM: 8192 MB Benchmarks: utah-g3d "speed test" 10.8s