Software written to test the functionality of processor-based digital hardware differs markedly from software intended to test for ringing and crosstalk on your pcb. As you bring up your next processor-based system you will need both types of test routines.
Unfortunately, software professionals who lack a hardware-design background find it difficult to understand the need for good ringing-and-crosstalk test routines. I wrote this article to explain to my friends in the software community why such tests are necessary, and what specific features are needed.
Testing for Crosstalk
All complex systems, whether they be made from electrical hardware, software, or biological elements, exhibit second-order noise effects. By the term "second-order" I mean that the system may appear to function perfectly most of the time (that's the first-order behavior), but occasionally still glitch, or conk out, in a way that causes erroneous behavior. Errors that appear sporadic and difficult to reproduce often arise from one of three sources: Hidden State Variables, Crosstalk, or Noise Aggregation.
HIDDEN STATE VARIABLES
The complete state of software subroutine Z is described by a collection of binary values for all of its relevant memory locations, registers, processor state variables, pointers, etc.
Let set A represent the collection of initiated values for subroutine Z. If the operating system and processor are running properly, and if the external inputs to the routine are identical each time it is run, one normally expects subroutine Z to proceed from state A to compute precisely the same result every time it is executed. This tenet of repeatability is key to the operation of all software.
I'm sure you've had the experience of running code that doesn't seem to work this way. For example, suppose subroutine Z inadvertently depends on the value in some register X which is set by an alien subroutine. The output of subroutine Z will then change depending on whatever value happens to be lying around in register X when subroutine Z begins. In other words, the output of subroutine Z depends on the sequence of activities run prior to beginning the subroutine. This frustrating sort of bug may not show up until you happen to run a sequence of activities that inserts a particularly heinous value into register X.
The variable X acts as a "hidden state variable". It's important to the operation of the program, but not explicitly initiated as part of set A.
To fix this sort of bug you need to either initiate variable X before subroutine Z starts, or make sure the subroutine works regardless of the initial value of X (perhaps by ignoring it).
Digital hardware exhibits a similar principle. A circuit works reliably, the same way every time, only when begun with the same initial state. The complete state of any transmission structure is specified by the voltages and currents at multiple positions along the structure, plus a number of other state variables pertaining to the state of things like terminating components, ac-coupling networks, package parasitics, and the power delivery network. Many of these effects may exhibit long, lingering tails of activity that persist well across bit boundaries. The only way to reset all these hidden state variables is to wait for the lingering effects to die out.
Unfortunately, in real-time operation the hidden state-variable effects don't have time to die out before the next bit. As a result the precise voltage received at the end of a long transmission structure is a function not only of the present bit transmitted, but also of the initial state established at the beginning of that bit by past history. Hardware designers refer to such interactions as ringing, reflections, or inter-symbol interference (ISI).
High-speed links commonly incorporate a budget for the maximum amount of inter-symbol interference (or residual ringing) permitted at the moment of sampling. This budget typically ranges anywhere from one to twenty-five percent of the signal swing depending on the logic family being used.
If you propose to write test routines that thoroughly test a piece of hardware, you should find out for each net class what is the maximum expected settling time of the ringing on that class of net. From this number you can determine the number N of previous bits of history that influence reception. A complete test routine then performs each test exercise beginning with all possible patterns of N preceding bits.
In some cases your hardware designer may know beforehand what pattern comprises the worst-case history. For example, when testing long serial links whose performance is limited by high- frequency attenuation the worst-case test pattern is often a long string of zeroes extending on either side of a single, isolated one (or the inverse of this pattern).
In another example, if a link of a certain length is expected to exhibit reflections occurring at specific intervals of M bits, then bursts of variable history at those particular locations will have the greatest effect on the receiver.
Whereas the problems associated with hidden state variables involve events taking place before the initiation of subroutine Z, crosstalk involves events taking place cotemporaneously.
A perfect example of crosstalk within a software-based system would be the problem of stack overflow. If you have a stack of insufficient size, subroutine Z may work perfectly by itself, and another subroutine Y may work perfectly by itself, but when run together (one interrupting the other) the stack may overflow with dire consequences.
A certain degree of interaction through the stack is both desirable and beneficial. A stack reduces the overall memory required for a system (by permitting multiple re-use of the stack memory area) and simplifies the development (in that the programmer need not be concerned with the details of memory allocation on every subroutine call). These benefits are obtained at the risk of stack overflow.
In the digital world a similar philosophy applies to the management of crosstalk. The crosstalk between digital circuits is a strong function of their proximity. While it is possible to lay out a printed-circuit card with sufficient space between circuits to guarantee a practically un-measurable degree of crosstalk, the resulting structure would consume so much space and require so many layers for routing that the cost would become prohibitive. To obtain the benefit of greater circuit density one squeezes the design, compacting all the traces, and accepting at the same time a certain risk of failure due to increased crosstalk.
Although digital engineers strive to reach the correct balance between density and crosstalk, we occasionally overshoot the mark.
Digital crosstalk from aggressor to victim generally occurs when the aggressor changes state. Test routines intended to assist with the verification of crosstalk levels toggle each bit in the system (each possible aggressor) one at a time, up and then down. An oscilloscope operator can easily synchronize to this pattern and determine, for each victim of interest, the interaction amplitude from each aggressor. Aggressors that create substantial amounts of crosstalk stand out clearly in this sort of test.
A stack, a disk, and a real-time operating system (RTOS) all involve attempts to allocate finite resources among various competing demands. The degree of resource utilization depends on the aggregation of demands.
As the level of demand heats up, the stack may overflow, the disk may fill up, and the RTOS may fail to complete its scheduled rounds before the next interrupt. A well-designed system should have a budget for the use of the stack, disk, and real- time resources. Part of system verification involves the monitoring of these resources to ensure that the usage remains reasonable given any anticipated pattern of system operation.
Crosstalk in a digital system behaves in a similar manner. The crosstalk received at any given node is the superposition of crosstalk from many possible aggressors. Complicating the situation is the fact that the interaction amplitude between two circuits may be either positive or negative. Crosstalk from a first aggressor may therefore sometimes mask the crosstalk from a second aggressor if the two interactions have opposite polarities.
A complete test program for any particular victim must begin by first identifying for each aggressor the polarity of crosstalk. Once the polarities are determined, the worst-case aggregate stimulation is formed by simultaneously changing the state of all aggressors in the following way:
- The group of all aggressors having a positive crosstalk interaction amplitude should change state in the positive direction.
- The group of all aggressors having a negative crosstalk interaction amplitude should change state in the negative direction.
The first group creates a maximum aggregation of positive crosstalk. The second group creates a maximum aggregation of crosstalk which is also positive, being the negative result of a negative change in state. The superposition of both groups of changes produces the maximum positive crosstalk. The inverse pattern (group one going low which group two goes high) produces the maximum negative crosstalk.
The victim circuit is expected to operate successfully in the face of worst-case crosstalk while also enduring at the same time all relevant patterns of prior history.
The ability to program specific aggregations of aggressor behavior is crucial to the determination of worst-case crosstalk.
In some cases, such as a wide bus of uniform construction with a solid underlying reference plane, the designer may know from first principles that all crosstalk interaction amplitudes have the same polarity. In this case certain well-known pattern such as "everybody goes high except the victim" tend to elicit worst-case crosstalk. The converse, "everybody goes low while the victim stays high" also works.
Interactions known as simultaneous switching noise (SSN) often occur within the power delivery system internal to an integrated circuit. The SSN interaction prevents crosstalk from aggregating in a strictly linear fashion as more aggressors are added to the fray. The SSN effect highlights the importance of explicitly checking the exact worst- case test pattern, as opposed to linearly summing the effects from each aggressor independently. The linear-sum procedure in some cases overestimates the actual worst-case crosstalk by as much as 100
You can't check every pattern of aggressors with every pattern of past history on every bit in a big system. That's an impossible task. Concentrate your efforts on blocks of logic where you expect high degrees of interaction. Also check around weak spots like connectors, chip packages, or other places where the integrity of the solid reference plane is compromised.
Provide specific tests to investigate asynchronous signals like clock, reset or interrupt lines. Make it easy for test technicians to switch on and off various noise-generating subsystems.
A good design process contemplates crosstalk verification while the ASIC floorplans are still under development. At this point you can insert extra test mode pins and JTAG test features. Later in the process it may prove impossible to add the test features you need to rigorously verify the design.
Crosstalk happens on the Internet, too. For example, back in January our newsletter listserver, called Topica, began injecting their own advertisements into these newsletters. As a result we pulled our list out of their hands and moved it to freelists.org. I thought that would be the end of the advertising mess. Well, as often happens with complex systems, changing one thing broke something else. Apparently, somebody got hold of our new account ID and used it to pester your mailboxes with even more advertisements. As far as we know they didn't get their hands on the list, they just found a way to feed their junk through it. Please believe me that we are doing everything we can to halt unauthorized use of this list. It is not my intention to pass along unsolicited advertisements.
Thanks for bearing with us.
Dr. Howard Johnson