Tuesday, 10 December 2013

Z80 Dhrystones

In the early 80s, my schoolmate David Allery's dad's workplace had a pdp-11/34, a minicomputer designed in the 1970s. All the reports at the time implied that a pdp-11 anything had absolutely awesome performance compared with the humble 8-bit computers of our day.

Yet decades later, when you look at the actual performance of a pdp-11/34, it seems pretty bad in theory. You can download the pdp-11 handbook from 1979 which covers it.

First, a brief introduction to computer processors, the CPU, which executes the commands that make up programs. I'll assume you understand something of early 80s BASIC. CPUs execute code by reading in a series of numbers from memory, each of which it looks up and translates into switching operations which perform relatively simple instructions. These instructions are at the level of regn=PEEK(addr), POKE(addr,regn), GOTO/GOSUB addr, RETURN; regn = regn+/-/*/divide/and/or/xor/shift regm; compare regn,regm/number. And not much else.

The pdp-11 was a family of uniform 16-bit computers with 8x16-bit registers, 16-bit instructions and a 16-bit (64Kb) address space (though the 11/34 had built-in bank switching to extend it to 18-bits). The "/number" refers to the particular model.

On the pdp-11/34, an add rn,rm took 2µs; add rn,literalNumber took 3.33µs and an add rn,PEEK(addr) took 5.8µs. Branches took 2.2µs and Subroutines+Return took 3.3µs+3.3µs.
That's not too different to the Z80 in a ZX Spectrum, which can perform a (16-bit) add in 3µs; load literal then add in 6µs, load address then add in 7.7µs; Branch in 3.4µs and subroutine/return in 4.3µs+2.8µs.

So, let's check this.

A 'classic' and simple benchmarking test is the Dhrystone test, a simple synthetic benchmark written in 'C'. A VAX 11/780 was defined as having 1 dhrystone MIP and other computers are calculated according to that.

If you do a search, you'll find the pdp-11/34 managed 0.25 dhrystone MIPs. To compare with a ZX Spectrum I used a modern Z80 'C' compiler: SDCC; compiled a modern version of dhrystone (changed only to comply with modern 'C' syntax) and then ran it on a Z80 emulator. I had to modify the function declarations a little to get it to compile as an ANSI 'C' program, but once it did I was able to ascertain that it could run 1000 dhrystones in 13 959 168 TStates.

The result was that if the emulator was running at 3.5MHz, it would execute 0.142dhrystone MIPs, or about 57% of the speed of a pdp-11/34. Of course perhaps a more modern pdp-11 compiler would generate a better result for the pdp-11, but at least these results largely correlate with my sense that the /34 isn't that much faster :-) !

Compiling SDCC Dhrystone

SDCC supports a default 64Kb RAM Z80 target, basically a Z80 attached to some RAM. I could compile Dhrystone 2.0 with this command line:

/Developer/sdcc/bin/sdcc -mz80 --opt-code-speed -DNOSTRUCTASSIGN -DREG=register dhry.c -o dhry.hex

The object file is in an Intel Hex format, so I had to convert it to a binary format first (using an AVR tool):

avr-objcopy -I ihex dhry.hex -O binary

SDCC also provides a z80 ucSim simulator, but unfortunately it's not cycle-accurate (every instruction executes in one 'cycle'). So, I wrote a simulated environment for libz80, which turned out to be quite easy. I used the following command line to run the test:

./rawz80 dhry.bin 0x47d

The command line simply provides the binary file and a breakpoint address. The total number of TStates is listed at the end.

The entire source code is available from the Libby8 Google site (where you can also find out about the FIGnition DIY 8-bit computer).

So Why Did People Feel The Pdp-11 Was So Fast Then?

By rights the pdp-11 shouldn't have been fast at all.
  1. The pdp-11 was typical for the minicomputer generation: the CPU (and everything else) was built from chains of simple, standard TTL logic chips, which weren't very fast.
  2. It was designed for magnetic core memory and that was slow, with a complete read-write cycle taking around 1µs.
  3. It was part of DECs trend towards more sophisticated processors, which took a lot of logic and slowed it down further. 
But (3) is also what made it a winner. It's sophistication meant that it was a joy to program and develop high-quality programming tools for. That's probably a good reason for why both the language 'C' and the Unix OS started out on a pdp-11.

By contrast, although early 8-bit microprocessors were built from custom logic and faster semiconductor memory, the sophistication of the CPUs were limited by the fabrication technology of the day. So, a Z80 had only 7000 transistors and an architecture geared for assembler programming rather than compiled languages.

And there's one other reason. The pdp-11 supported a fairly fast floating-point processor and could execute, for example, a floating point multiply in typically 5.5µs, something a Z80 certainly can't compete with.