Approaching the Edge

by Dr. Howard Johnson. First printed in Proceedings of DesignCon, Jan., 2004

Many thanks to Greg Edlund of IBM for organizing the DesignCon 2004 panel discussion, "Establishing Pass-Fail Criteria for High-Speed Digital Interfaces", and for inviting me to make this 5-minute presentation alongside other presentations from Edward Sayre, Robert Haller, and Dale Becker.

If you've ever been to Las Vegas you probably saw a "quarter pusher" arcade game. In this game the machine holds a flat tray chock full of quarters. The quarters are piled so high that some dangle precipitously over the edge of the tray, just beyond your reach inside a heavy glass casing.

As you drop your coins one at a time into the game, the machine wiggles and pushes your new coins into the pile from the back side, pushing the remaining pile of other coins forward, moving them closer and closer to the point where a clot of quarters might cascade over the front edge of the tray, out the chute, and into your waiting arms.

Construction on the Leaning Tower of Pisa begun in 1173 and was suspended at the completion of the third ring due to a subsidence of the soil underneath one side.

Construction resumed in 1275 with a grand plan to stabilize the soil and straighten the tower. Unfortunately, after completing the sixth ring, the tower began to tilt once more.

Today the overhang increases at a rate of about 1.2 mm per year.

To maximize the visual effect, the game is designed so that it pushes a lot of coins very, very close to the edge without actually tipping that many over... That game reminds me of the way we sometimes design high-speed digital circuitry.

Worst-case design

Assuming all elements of risk are included, worst-case design:

Becomes infinitely complex,
Over-designs your system, and
Nobody ever really does it.

Those of you at work today on cutting-edge systems understand that the name of the game in the high-performance business is to provide as much performance as possible, at a reasonable cost, and subject to the ironclad rule that the system must be reliable. As you increase performance (for example: turn up the clock) every system inevitably plummets off the edge of a reliability cliff. It is your responsibility as a designer to make your system run as close as possible to this cliff without ever actually falling off.

Experienced designers do this by:

Ignoring some elements of the design
Assuming values for unknown parameters
Building a "best guess" system,
Measuring the finished system, and
Adjusting the design until the measurements look OK.

Does that sound haphazard to you? Let's think together for a minute about how worst-case design really works. For example, how would you evaluate the safety of the picnic-goers in this picture?

Balance Rock in Pittsfield, Massachusetts

Figure 1—Balance Rock, Lanesborough, Massachusetts (Video)

A mathematician might want to calculate the center of gravity of the boulder by measuring the geometrical shape of the boulder, and estimating the distribution of density inside. Then he would plan to measure the size of the bearing surface touching the ground, and calculate the moment required to tip the boulder sufficiently so that the center of gravity moves beyond the footprint of the bearing surface. Finally, he would compare the calculated moment to the force of the expected wind, etc.

Would you do it that way? Sure, the mathematical procedure works, but nobody can tell you the exact shape of the boulder (unless you have a really big 3-D digitizer) - and nobody knows what's included inside. Neither do we know exactly how the boulder bears on the ground, or the maximum wind pressure. Since none of the variables are known, our hypothetical mathematician is going to have to make conservative guesses at everything, and as a result I'll bet he concludes that the rock is not safe.

Yet it's been standing for a long, long time.

If it were your job to make the rock safe, you'd probably go get a big backhoe and then push on the rock to see if it moved. Then you'd pour some concrete around the bottom and push on it again until you could push really hard without moving it. That's the way real design sometimes has to work—we take measurements and then correct as needed.

1924 train wreck.

Figure 2—In 1924 this bridge gave way, pluning the train into the river and one man to his death. From: Wawanesa: A Prairie Heritage (Manitoba, Canada).

Measure-and-Correct

I'm not suggesting that worst-case design is useless. Far from it. A worst-case budget is usually where projects start.

I'm just saying that worst-case budgets rarely work, for two very good reasons. Either because you didn't include all the necessary factors, or you made wrong assumptions to fill in gaps in the available data.

To design efficient, high-performance designs that never fall off the cliff you must combine your system budget with a lot of measurements. The measurements tell you how the budget is working right now, and the budget tells you what would happen if things got a little worse tomorrow. Between the two approaches, you have all the information you need. My two-stage approach requires two things:

Adequate laboratory equipment, and adequate laboratory practices (like low-capacitance probes with really short ground wires)
Good, basic knowledge about what parameters matter the most, and how they can be adjusted

Example of Adequate Measurement

Figure 1 shows an eye pattern measures the way I think all SERDES links should be measured. In this picture the SERDES itself is showing you the eye as measured at its actual slicer internal to the chip, after all packaging considerations, and including any bandwidth-limiting preamplifier and AGC stages.

The eye in this picture is captured from a functioning system using a transceiver from Accelerant Networks (2004). The data rate is 10 Gb/s, operating over 1 meter of ordinary FR-4 backplane. The use of PAM-4 coding (4 levels, or two bits, per data baud) reduces the required data baud rate, improving jitter, and also reduces the maximum slew rate on the transmitted signal, improving crosstalk.

PAM-4 Digital Eye Pattern After 40 Inches of FR-4

Figure 3—Eye received within 10Gb/s PAM-4 SERDES w/ 40" FR-4. Figure courtesy of Accelerant Networks, Inc.

The internal-measurement feature requires one additional test slicer within the receiver chip, equipped with an adjustable "roving" slicer level and an adjustable clock phase. The chip automatically drives the adjustble level and clock phase in a sweeping pattern, tracing out the zone where the test slicer gives the same data as the real, properly adjusted slicer. Comparaing the two received data patterns measurements you can trace out a complete "bath-tub" BER curve for the receiver showing the receiver margins—all while the chip is in live operation.

I claim this sort of embedded, internal self-measurement is superior to any measurement taken from the external terminals of the chip package. Someday, all chips will work like this.

Good, Basic Knowledge

The number of things you can do to fix a broken system (or shave cost out of one that works too well) is limited only by your personal knowledge of signal integrity. Don't pass up any opportunities to learn more about this subject.

The best way to learn is to step away from the simulator now and then and take some measurements. Then go out and meet some other digital engineers and talk to them about what you found. Read a book about general communications theory. Learn something new. Find out what parameters matter the most in your high-speed design, and what you can do to control them.