Very Low Bit Rate Parametric Audio Coding

Dissertation
Heiko Purnhagen
Fakultät für Elektrotechnik und Informatik
Gottfried Wilhelm Leibniz Universität Hannover
2008

Abstract

In this thesis, a parametric audio coding system for very low bit rates is presented. It is based on a generalized framework that combines different source models into a hybrid model and thereby permits flexible utilization of a broad range of source and perceptual models. The developed parametric audio coding system allows efficient coding of arbitrary audio signals at bit rates in the range of approximately 6 to 16 kbit/s.

The use of a hybrid source model requires that the audio signal is being decomposed into a set of components, each of which can be adequately modeled by one of the available source models. Each component is described by a set of model parameters of its source model. The parameters of all components are quantized and coded and then conveyed as bit stream from the encoder to the decoder. In the decoder, the component signals are resynthesized according to the transmitted parameters. By combining these signals, the output signal of the parametric audio coding system is obtained.

The hybrid source model developed here combines sinusoidal trajectories, harmonic tones, and noise components and includes an extension to support fast signal transients. The encoder employs robust algorithms for the automatic decomposition of the input signal into components and for the estimation of the model parameters of these components. A perceptual model in the encoder guides signal decomposition and selects the perceptually most relevant components for transmission. Advanced coding schemes exploit the statistical dependencies and properties of the quantized parameters for efficient transmission.

The parametric approach facilitates extensions of the coding system that provide additional functionalities. Independent time-scaling and pitch-shifting is supported by the signal synthesis in the decoder. Bit rate scalability is achieved by transmitting the perceptually most important components in a base layer bit stream and further components in one or more enhancement layers. Error robustness for operation over error-prone transmission channels is achieved by unequal error protection and by techniques to minimize error propagation and to provide error concealment.

The resulting coding system was standardized as Harmonic and Individual Lines plus Noise (HILN) parametric audio coder in the international MPEG-4 Audio standard. Listening tests show that HILN achieves an audio quality comparable to that of established transform-based audio coders at 6 and 16 kbit/s.

Keywords: parametric audio coding, signal decompeosition, parameter estimation, source model, perceptual model, MPEG-4 HILN

The dissertation was published by TIB/UB in electronic form. It is available as PDF file, and also the presentation slides are available as PDF file.

Audio Demonstration: MPEG-4 HILN at 6 kbit/s