The technology in question encompasses myriad different aspects. If there is a common thread it is related to communications and more specifically interfaces between different formats. I use the term format somewhat loosely here. I intend to include, among other cases, the notion of the “analog-digital boundary” and the “circuit-packet boundary”. The discussion is also geared to digital signal processing and the role DSP plays in telecommunications and the role of DSP at places where format conversion is necessary.
The first few episodes will address the analog-digital boundary and place the role of format conversion (between analog and digital) in perspective. Subsequent episodes will address the circuit-packet boundary and the challenges that must be overcome when format conversion is attempted.
Reader feedback is always useful to determine whether we are on the right track or not; whether we have become just another salesy blog or not; whether there are burning questions that need answers or at least discussion; whether what we pen makes actual sense or not;…and the list goes on. Please direct comments and questions to infoblog@floreatinc.com and please include your full name, company, and contact information so we can respond appropriately.
Is pass-through an alternative?
The human auditory system is quite forgiving. This, no doubt, is why the perceived quality of speech in VoIP (Voice over IP) systems is so good, even when there is some packet loss in the packet-switched (IP) segment. Packet Loss Concealment (PLC) methods do wonders in filling in snippets of sound in a way that is not irritating – to humans, anyway. Machines, such as the modems used for data and/or facsimile transmission, on the other hand, are very much less forgiving. There are no good ways to fill in snippets to replace signal segments lost because a packet went astray. Missing signal segments manifest themselves as data errors and, in some (extreme) cases, dropped calls.
One would imagine that, if there was no packet loss at all, then VoIP architectures could support cases where the phone call is not actually human speech but the modulated signal of a data call. The circuit-switched PSTN has been used in this manner for connecting modems and supporting facsimile machines and it is just natural to expect the packet-switched replacement to do just that, at the same level of quality of experience. An end-user that has a facsimile machine, and has used it satisfactorily for a long time, might look askance at being told that it will not work anymore just because the phone company has decided to migrate to next generation networks based on packet-switching.
A principal function of VoIP is the conversion between “circuit” and “packet”. In traditional telephony, samples of the analog voice-band signals are extracted at a sampling rate of 8 kHz and encoded into 8 bits per sample (ITU-T G.711). The resultant bit-stream (64 kbit/s) is referred to as a “DS0”. The conversion of this bit-stream into a packet format is achieved in one of several ways. In conventional VoIP, the DS0 is considered to be a speech signal and may be compressed using encoding schemes such as, for example, ITU-T G.729. Furthermore, the notion of silence provides for additional compression. At the receiving end the speech signal (DS0) is re-established. Most compression schemes used in VoIP are “lossy” in the sense that the recreated DS0 is not a carbon copy of the originating side but when converted back to analog it sounds pretty much like the original speech.
What if the voice-band signal was not speech? The end-point could well be a data modem or a facsimile machine. There are ways, such as monitoring the special tones generated by modems at the start of a call to ascertain that the signal is not speech. It is also possible to distinguish between data modems and facsimile modems. Two general approaches to handling such machine signals are possible.
The preferred approach is to use the notion of modem-relay (data) and/or fax-relay (facsimile). These two approaches are addressed by ITU-T V.150 and ITU-T T.38 Recommendations, respectively. The idea is straightforward. The incoming voice-band signal is demodulated using the correct (soft) demodulator and the information packetized. Using error-correction schemes the deleterious impact of packet loss in the IP (packet-switched) segment can be mitigated. At the other end the depacketized information is then remodulated to mimic a fax/data modem signal. In some cases, such as “terminal T.38”, the depacketized information is used to create the facsimile image, in a manner analogous to what a fax machine would do, but stored in a file rather than printed immediately. Such relay and “terminal” methods provide a robust method of delivering facsimile and data content across an IP segment.
The less expensive approach is to use the notion of circuit-emulation. This approach is referred to as pass-through. Here the IP segment emulates a circuit-switched segment. The bytes of the DS0 are collected and packetized directly. Any special signal processing such as echo cancellation, automatic level control, compression, etc., is turned off in order to preserve the integrity of the information at the bit level. At the other end the jitter buffer is fixed at its largest size to maximize the packet network jitter that can be accommodated. Such an action is allowed because the machines (modems) are not as picky about delay as are human beings. The bytes are extracted from the packets, appropriately serialized, and the DS0 reconstructed. The term pass-through stems from the belief that the reconstructed DS0 is identical to the originating DS0.
This sounds very plausible in principle. However, the devil is in the details. First, any packet loss creates “holes” that have to be filled in and this is not possible to do error-free. Second, if the clock rate at the two gateways at the circuit-packet boundaries is not exactly the same then the reconstructed DS0 will not be the same as the original. Considering that even expensive oscillators can be different by several parts per million, pass-through is not likely to be robust. A fractional frequency offset of 125 parts per million between the two ends implies that every second one sample (at 8 kHz the sampling interval is 125 microseconds) will be lost or have to be “created”. In the case of speech, such sample “creation” or “deletion” can be done during periods of silence and thereby be unobjectionable. No such luck for voice-band modem signals.
The Analog-Digital Boundary — 2
Conversion from analog to digital formats, at the analog-digital boundary, is not necessarily perfect. The conversion involves “discretization” of the time domain whereby a continuous time signal is represented by sampled values at discrete time instants; the conversion involves quantization of the sample values since the representation is necessarily finite word-length. The rate at which the analog signal is sampled must be high enough or impairments related to aliasing can occur; the word-length must sufficiently large so quantization noise is not excessive.
The oft-quoted Nyquist-Shannon sampling theorem, attributed to Harry Nyquist and Claude Shannon, provides a theoretical minimum sampling rate, the Nyquist rate, required to avoid aliasing. Specifically, the Nyquist rate is twice the (finite) bandwidth of the signal. Considering that it is more often than not difficult, if not impossible, to categorically state that a signal is finite bandwidth, aliasing is an unavoidable impairment. At best we can provide adequate filtering prior to conversion to keep the effects of aliasing within reasonable limits. For voice-band telephony signals we assume that prior to filtering the signal power at frequencies outside the band of interest (up to ~4 kHz) is less than the in-band signal power. With this assumption in place, filter characteristics are specified such that the stop-band attenuation is adequate to reduce out-of-band signal power to reasonable levels (see ITU-T Rec. G.712). Ideal filters, with infinite attenuation over bands of Fourier frequencies are not feasible – at best we can achieve infinite attenuation at some discrete frequencies and consequently aliasing is just a fact of life that we need to deal with. It should be emphasized that once it has occurred, it really cannot be undone and so there is the inevitable trade-off of cost (filter order and attenuation) versus aliasing impairments.
The discretization of the “value domain” is another fact of life. Whereas true “analog samples” can be viewed as real numbers, for practical reasons the representation in the digital domain requires viewing the sample values as “integers”. Restriction to finite word-length is accompanied by additive quantization noise – another compromise that must be dealt with. Clearly increasing the word-length can, if done correctly, reduce the quantization noise power at the expense of hardware complexity (cost). Extensive research went into the development of the μ-law and A-law conversion characteristics and for telephony purposes these conversion characteristics are standardized (see ITU-T Rec. G.711).
A conventional telephony codec provides samples at a rate of 8 kHz and uses one octet (8 bits) to represent the sample value. The coding scheme is not uniform so typical arithmetic operations (such as addition) cannot be done by conventional digital hardware arithmetic logic units without conversion to a uniform code. It is well known that, in order to preserve the signal-to-noise ratio, the uniform code must be at least 13 bits (μ-law); most conversions done in (software) digital signal processing applications use 16 bits. A common error made by software engineers that do not understand the principles of μ-law and A-law coding is to use “typecasting” between a “short” and a “char” to achieve the conversion (they should really read and understand G.711!).
The Analog-Digital Boundary — 1
One inescapable problem with analog communications is the accumulation of noise. As one great philosopher named Murphy put it, the only way to remove noise completely is to not communicate at all. That not being a viable option, we are resigned to the fact that noise elimination is possible but is accompanied by the unfortunate side effect of eliminating the information signal itself. And noise accumulates with every “hop”. All electronic equipment adds noise in one form or another. As analog signal propagates from source to destination it also gets attenuated and the logical approach of signal amplification adds noise. It is a fact of life in analog communications that signal-to-noise ratio never improves – at best it remains constant. Murphy developed corollaries of Newton’s laws whereby the signal-to-noise ratio remains constant only at absolute zero temperature and, furthermore, we can never achieve absolute zero temperature. The best we can do is minimize, to the best of our ability, the power of the additive noise thereby achieving the best noise figure we can hope for (noise figure is a measure of the degradation of signal-to-noise ratio).
Digital communication provides a solution – of a sort. The idea is to eliminate the accumulation of noise, at least as it happens in analog situations. The problem is transformed from the notion of additive noise to one of bit-errors. As a digital signal is transported from source to destination there is additive noise but rather than being accumulative in a linear time-invariant sense this noise corrupts the signal resulting in a possibly incorrect decision whereby a “zero” is interpreted as a “one” or vice versa. The possibility of regenerating the digital signal before the (inevitable) additive noise impacts the probability of error essentially means it is possible to have error-free, or nearly so, transmission. The well known Shannon “Channel Capacity” theorem provides a guide as to the bit-rate that can be achievable, essentially error-free, over a channel as a function of bandwidth and signal-to-noise ratio.
Digital communication does appear to be a panacea of sorts. One fly in the ointment, as it were, is that fact that the information is often analog. For example, the most common of human communications over a network is speech (voice-band). Between mouth and ear there are many conversions. At the sender end these include acoustic pressure to electrical, essentially analog, and then analog to digital; at the receiving end are the reverse conversions from digital to analog and electrical to acoustic.
Conversion from analog to digital formats, at the analog-digital boundary, is not necessarily perfect. The conversion involves “discretization” of the time domain whereby a continuous time signal is represented by sampled values at discrete time instants; the conversion involves quantization of the sample values since the representation is necessarily finite word-length. The rate at which the analog signal is sampled must be high enough or impairments related to aliasing can occur; the word-length must sufficiently large so quantization noise is not excessive.






