Lab 4: Linear predictive coding of music

Task

The goal of the lab is to make a functioning linear predictive coder for music, measure its rate-SNR performance and also to check its subjective performance

The work should be done in groups of 1 or 2 students.

Examination of the lab is by written report.

Test data

For your first experiments, use these two short music files as test data:

For the final tests, use these full length files:

All the test files are mono, sampled at 44.1 kHz and quantized to 16 bits/sample. This means that the raw uncoded data rate is 705.6 kbit/s.

Function skeleton

Start with this Matlab function skeleton

pred_coder.m

The function implements a linear predictive coder. As a starting point, no prediction is performed and the prediction error is always quantized to zero. Your job is to edit the function so that you get a fully functional predictive coder.

Here is also a function skeleton for the corresponding predictive decoder. A predictive coder always contains a decoder as well, but it might be a good idea to also use a separate decoder to make sure your coder works as intended. Edit the decoder to create dhat and xhat in the same way as the coder does and make sure that the coder and decoder functions return exactly the same xhat.

pred_decoder.m

Uniform quantization

The first thing you should do is to add proper quantization. Since the coder also contains a decoder, you need both quantization and reconstruction.

The quantizer is a uniform quantizer, remember what you did in lab 3.

With a functioning quantizer, you can measure distortion (mean square error) and SNR for your coder.

Prediction

Next you should add prediction. In order to choose a good predictor, find predictor coefficients a_i that minimize the prediction error variance (see the relevant parts of the course literature). For this you will need to estimate the auto correlation function R_xx(k) for the input data. How to do this is described on the course slides.

The function skeleton is written as if you do this predictor optimization outside the function and supply it as an input argument, but you could also put this code inside the function.

Suitable predictor sizes are 1-8. Which is the optimal size will depend on other choices in the coder. For instance, if you have coarse quantization, you want to use a shorter predictor than if you have finer quantization.

Source coder

The next thing to add is source coding so you can measure the resulting rate of the full coder. Do huffman coding of the integer signal q just as you did in lab 3.

Since the function returns q, you can do the huffman coding outside the function. It can also be added inside the function, after the main loop.

Full coder experiments

Now that you have a fully functioning coder, it is time to see how well it performs and how the choices of parameters affect the performance.

Plot rate-SNR curves (see lab 3) for different predictor sizes and compare them to each other. Suitable rates are between 2 and 6 bits/sample (this corresponds to rates between 88.2 and 264.6 kbit/s).

To make the comparison easier, plot the curves in the same figure.

Subjective performance

It is also interesting to check the subjective performance, ie what is the perceived quality of the decoded music to a human listener.

Write the decoded music to a file (use audiowrite) and listen to it.

What is the lowest rate you can use and not hear any difference between the original and the coded music?

What is the lowest rate you can use and still have an acceptable quality?

Bonus problem 1: Stereo music (not mandatory)

Usually music is in stereo, ie it consist of two channels (left and right). The simplest way to code this is to treat the two channels as two separate audio signals. This basically means that we get double the rate at a given quality, compared to a mono signal.

Since the two channels are often very similar, a simple way of utilizing the similarity is to code a sum signal (L+R) and a difference signal (L-R) instead.

Try this on these stereo test signals and see if you can get better performance than just coding the left and right channels independently.

Bonus problem 2: Frames (not mandatory)

In a real world application, we usually want to be able to start decoding a coded music signal at any position, without having to decode everything up to that point. For instance, if you are streaming music over the internet, you want to be able to jump into the middle of a song.

To solve this, the music is divided into small parts (usually referred to as frames) that are coded independently of each other. An advantage of this is that we can adapt the coder to local statistics, but on the other hand we get more extra information to transmit (the decoder needs to know the predictor coefficents, quantization parameters and source coding parameters for each frame instead of just once for the whole song).

As an example, if we need to be able to decode with single second resolution, we can at most have 44100 samples in each frame.

Try frame coding to see if you can get better performance than having just a single predictor for the whole song.

Examination

Examination of the lab is by a short written report. Describe how you solved the problems and what your results are. Also include any program code you’ve written.

Send an electronic version of your report (in PDF format) to Harald. Give the name, person number and email adress of every group member.

Deadline

No hard deadline. I would prefer if you send in your reports before the exam period. If you are late, you might have to wait until the next exam period for your points to be reported into Ladok.

Questions?

If you have any questions about the lab, contact Harald.