An Experiment in Video Encoding
An intensive testing of mencoder options

Note: with some browsers, the curves can be interactively examined. Yours does not seem to be one of those, but you can try it anyway. Note that the images are links to wider and more readable versions.

Introduction

In this article, we study the influence of various encoding parameters available in mencoder on the quality of the resulting video, with regard to the corresponding bitrate.

Methodology and choices

Samples

To test the codecs against various situations, we chose to work with three videos corresponding to three archetypes: calm movie, fast moving movie, and anime. The samples we settled on are Le Fabuleux Destin d'Amélie Poulain, The Matrix and Perfect Blue.

All samples are about half an hour long. Since it is generally not possible to change the parameters during the encoding, it makes sense to take parameters that give good results on a typical movie, and not just on a particular scene. Furthermore, long samples are statistically more significative: with short samples, there is a risk to have especially good or bad results just because the encoded scene has some specific feature, for which some codec is especially good or bad.

All samples are taken from 16/9 PAL 720×576 25 FPS DVDs. All are cropped to the common smaller size of 696×416. For Amélie, the cropped area was mostly black pixels; for Matrix, there are a few pixels of actual content; for Perfect Blue, which is 1.85/1, it was mostly content.

Quality evaluation

These tests use only an objective quality evaluation. Since several hundreds videos were encoded and evaluated, jury-based methods were just out of question.

One consequence is that all options that count on subjective feeling rather than objective video quality could not be evaluated. For example, an option that saves bits in fast scenes to spend them in slow ones would only decrease the objective quality.

We used two metrics for these tests. First, the usual mean square error / PSNR. Second, the metric described in A Universal Image Quality Index, by Zhou Wang and Alan C. Bovik, which relies mostly on the linear correlation between the images. Both were evaluated on the YUV12 stream, meaning that the luminance factor has a weight of four while each chrominance factor has a weight of one. The quality was evaluated both at the original size of 696×416 and after scaling to an hypothetical display-resolution of 1280×538 (with square pixels; it will be referred in the rest of this document as HD; it was our intention to use a resolution of 1920×808, but it was much too slow, we settled instead for a typical resolution of nowadays screens and videoprojectors). We will see in the article how the quality evaluation changes the results.

Encoding method

We chose to use fixed-quantizer encoding. Multipass average-bitrate-directed encoding is useful when aiming for a particular file size, especially to fit on limited storage devices like CDs. Nowadays, storage devices are less a problem, and it becomes smarter just to choose a level of quality and take whatever bitrate it requires. A fixed quantizer is the nearest setting we found for chosing a level of quality. We did not try the crf option with x264.

Note that it is not a contradiction not to care about the actual resulting file size and yet to want to squeeze every last drop of quality from each invested bit.

Before encoding, the hqdn3d filter was applied. Of course, it was applied to the original video too when evaluating the quality; we did not want to test the strength of the filter. Apart from scaling and cropping, no other filters were applied. Scaling was done with the default options.

What this article does not address

This article does not address at all the question of audio encoding, nor the question of container formats.

We do not care about compatibility, neither software: if your home DVD/DivX® player does not know about H.264, too bad; nor hardware: if your Athlon 1800 is too slow to decode H.264 at 1024×576 in real time, too bad.

We care very little about encoding time, and not at all about realtime encoding.

Exploratory experience

Object

The first phase of this experience is to explore the whole range of quantizer values on a limited set of encoding settings, varying parameters that should have a big influence on the resulting quality.

The following parameters were explored:

The exacts encoding parameters were the following:

The quantizer values were explored from 1 to 31 or 51 by steps of 2.

Results

On the whole, twenty settings were tested, except for Perfect Blue, where there were 28. It took a little more than a month for one Core 2 Duo E6400, one Athlon 64 X2 3800 and one slower computer.

Comparaison of metrics

First of all, let us compare the results given by the four metrics (PSNR and correlation, at native and HD size). The first curve plots the correlation metric at HD and native size. The second curve does the same for PSNR. The third plots the correlation against the PSNR, at HD size. We thought useless to plot the fourth side of the square.

[expl-corr-hd-native] [expl-psnr-hd-native] [expl-corr-psnr-hd]

From the first two curves, we can see that the resolution of evaluation has very little influence, except for the wide variant at high quality, where HD evaluation gives better results.

From the last curve, we can see that correlation and PSNR give a very similar evaluation of the quality. Strangely, anime content performs better with correlation, by a constant factor. And correlation performs better with wide prescaling.

In the rest of the article, we will use mostly the HD correlation curves, but we will try to collate them against the native PSNR curves.

Quality curves

Here are the results for each of the samples. The abscissa is the bitrate, the ordinate is the correlation metric at HD size. The vertical grey line shows the bitrate of the source DVD content.

[expl-bitrate-correl-amelie] [expl-bitrate-correl-matrix] [expl-bitrate-correl-pblue]

Here are the results with the PSNR at native size as ordinate:

[expl-bitrate-psnr-amelie] [expl-bitrate-psnr-matrix] [expl-bitrate-psnr-pblue]

In the XviD tests for Matrix, an anormal loss of quality was noted except for very low quantizers. It was tracked down to the fact that XviD decided to drop a black frame in the middle of the movie: since the quality evaluation works frame by frame, it was then mostly evaluating the amount of movement in the movie. We re-ran the tests using a modified version of the evaluation, hardcoding the dropped frame.

A few other XviD encodings show an anormal loss of quality. We assume that they suffer from a similar problem, although we did not take the time to check. We discarded them as irrelevant; we believe that it does not change the principle of our results.

Analysis

There is no doubt, x264 wins. The result is quite tight at high quality, but the difference becomes greater at lower quality.

Remarks about XviD and lavc

For XviD and lavc, the choice of the quantizer type (MPEG or H.263) makes almost no difference, except at quantizer 1. Paradoxally, in that case, H.263 is better for lavc while MPEG is better for XviD.

For XviD, the cartoon option makes almose no difference either, and when it does, it is slightly for the worst. We believe that the effect of this option is mostly drowned by the hqdn3d filter.

Influence of the prescaling

The parameter that has the biggest influence is obviously the prescaling. For quantizer up to 25, the native size is the best choice. Beyond that, shrinking becomes interesting. The lower the quality, the smaller the optimum size, of course, but the evolution is quite slow, and small image sizes are only necessary for very low quality.

For high quality, if rescaling to square pixel is necessary, the choice between shrinking the height or expanding the width is not totally straightforward, but it seems that shrinking the height is probably somewhat better.

Temporary conclusion

If no other constraint prevents it, we recommend to use the x264 codec, without prescaling for high and medium quality, and with a little shrinking for low quality.

Further experiments should focus on the effect of the multitude of options of the x264 codecs, both on the quality and the encoding time.

Note that if you want to use our curves to roughly choose a quantizer for an approximate bitrate, you must take the area of the image into account: for content that takes the whole 720×576 image and is not cropped, the same quantizer will probably give a 30% higher bitrate.

Appendices

Authors

In this article, “we” is Nicolas George. These experiences have been led with the help of Gaëtan Leurent, Mehdi Tibouchi, David Madore and François Garillot for the choice of samples and the CPU time.

Software used

All encodings have been done with mencoder, SVN revision 21804 from 2007-01-01, with libavcodec SVN revision 7376.

The XviD codec used was the 20070101 snapshot.

The x264 codec used was the 20061231-2245 snapshot.

The evaluation was done with this piece of badly-written and undocumented code.


Last modified: 2007-03-19.