Poznan 3D Codec

This page presents a technology for efficient coding of 3D video in Multi-view plus Depth (MVD) representation.

The new 3D video systems will include glassless displays and will provide realistic impression of depth as well as controllable stereoscopic base-line distance. In such 3D video systems the description of a 3D scene should be richer than just a stereo pair. Some applications will need many views to be available at the receiver. For example, future autostereoscopic displays are expected to present simultaneously even 50 different views corresponding to cameras with parallel optical axes equally spaced within an interval of order of the human inter-ocular distance (about 64 millimeters). Such dense spacing of the views yields strong similarity between the neighboring views that can exploited for compression. Moreover, in the receiver many virtual views may be efficiently synthesized using the Depth-Image-Based Rendering (DIBR) [3,6,19] and for transmission the MVD format often may be limited to only 2-3 views accompanied with the corresponding depth maps [1]. In a realistic example of a system with an autostereoscopic display only 3 views with 3 depth maps are transmitted (Fig. 1).


Fig. 1. An example of a 3D video system where 3 views with 3 depth maps are transmitted and used for synthesis of many virtual views.

The proposed technology is backward compatible with HEVC standard, so one of the views called base view can be decoded by a legacy HEVC decoder (video only). The remaining data (video and depth) can only be decoded by the 3D decoder, because additional syntax structures are used in the bitstream (fig. 2).
For both videos and depth maps hierarchical view coding structure similar to MVC is used: the already coded views are used as references for prediction of the subsequent views. There are three main inter-view prediction mechanizms used:

  • view synthesis prediction with Disoccluding Region Coding,
  • disparity-compensation prediction (MVC-like),
  • depth-base motion prediction (DBMP).

The main idea of the proposed coding technology is to exploit view-synthesis prediction as much as possible. The base view (HEVC-compatible view) and its depth are coded directly i.e. without any inter-view prediction. The side views (video and depths) are synthesized from the base view. Then, in the side views, disoccluded regions (hidden by the occlusion in the base view) are identified. Only the disoccluded regions from the side views are coded. Coding of the side views takes advantage of other inter-view prediction modes: disparity compensation and DBMP.
The cameras parameters are compressed and transmitted together with the videos and depth maps in a single bitstream.


Fig 2. Proposed codec structure for 3-view MVD.

The detailed technical description can be found in [1].

In october 2011, the codec has been submitted as a proposal to Call for Papers [2] issued by Motion Picture Experts Group (MPEG) on behalf of International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC). After the submission, all proposals have been extensively tested both subjectively and objectively and the proposal from Poznan University Technolgy has been qualified as one of the best performing.

Even after the resolution of CfP, we have continued to develop and assess our codec in finer details. The objective and subjective experiments that we have performed [3,4,5] proved the results attained by MPEG and also provided in-depth look into the performace of our technology.

The evaluation methodology, can be found in [2,6].

Table 1. Input views position and synthesized output view position for 2-view and 3-view configurations.

Sequence 2-view input 3-view input Views to Synthesize from 2-view test scenario (and stereo pair) Views to Synthesize from 3-view test scenario (and stereo pair)
Poznan_Hall2 7-6 7-6-5 6.5 (6.5-6) 6.125-5.875 (6.125-5.875)
Poznan_Street 4-3 5-4-3 3.5 (3.5-3) 4.125-3.875 (4.125-3.875)
Undo_Dancer 2-5 1-5-9 3 (3-5) 4.5-5.5 (4.5-5.5)
GT_Fly 5-2 9-5-1 4 (4-2) 5.5-4.5 (5.5-4.5)
Kendo 3-5 1-3-5 4 (4-5) 2.75-3.25 (2.75-3.25)
Balloons 3-5 1-3-5 4 (4-5) 2.75-3.25 (2.75-3.25)
Lovebird1 6-8 4-6-7 7 (7-8) 5.75-6.25 (5.75-6.25)
Newspaper 4-6 2-4-6 5 (5-6) 3.75-4.25 (3.75-4.25)

Below, it can find an results of coding with our technology for 2-view case and 3-view case for all 8 sequences defined in CfP.
The results include

  • stereoscopic (side by side) AVI files of 2 synthesized middle views
  • and bitstreams with executable decoder in RAR archive

Provided stereoscopic pair of views was synthesized based on decoded video and depth maps from provided bitstreams.
Exact position of the input and output view position can be found in table 1. Provided in AVI file view was subjectively evaluated during formal subjective evaluation of the proposals.
Downloaded AVI file can be viewer by a dedicated 3D software like Stereoscopic player.

Along bitstream and executable decoder, bath file is provided that can be used to decode the bitstreams. Executable decoder file is prepared for runing in windows 64 bit envirolement (other platforms executable decoder file can be obtain upon request). Decoder outputs reconstructed video and depth maps, along with camera parameters that can be used for view synthesis.

Fig 3. Synthesized stereopair of Poznan Street sequence in side by side format created based on 3 videos and 3 depth maps encoded at 1950 kbps.

Fig 4. Synthesized stereopair of Poznan Street sequence in side by side format created based on 3 videos and 3 depth maps encoded at 1180 kbps.

Fig 5. Synthesized stereopair of Poznan Street sequence in side by side format created based on 3 videos and 3 depth maps encoded at 710 kbps.

Fig 6. Synthesized stereopair of Poznan Street sequence in side by side format created based on 3 videos and 3 depth maps encoded at 410 kbps.

Table 2. AVI file with synthesized stereo pair from decoded video and depth maps, bitstreams and executable decoder for 3 view case.

Sequence avi bitstream
Poznan Hall 2 770 kbps
480 kbps
310 kbps
210 kbps
770 kbps
480 kbps
310 kbps
210 kbps
Poznan Street 1950 kbps
1180 kbps
710 kbps
410 kbps
1950 kbps
1180 kbps
710 kbps
410 kbps
Undo Dancer 2010 kbps
1200 kbps
780 kbps
430 kbps
2010 kbps
1200 kbps
780 kbps
430 kbps
GT Fly 1600 kbps
1080 kbps
600 kbps
340 kbps
1600 kbps
100 kbps
600 kbps
340 kbps
Kendo 1040 kbps
670 kbps
430 kbps
280 kbps
1040 kbps
670 kbps
430 kbps
280 kbps
Balloons 1200 kbps
770 kbps
480 kbps
300 kbps
1200 kbps
770 kbps
480 kbps
300 kbps
Newspaper 900 kbps
680 kbps
450 kbps
340 kbps
900 kbps
680 kbps
450 kbps
340 kbps
Lovebird 1 1270 kbps
730 kbps
420 kbps
260 kbps
1270 kbps
730 kbps
420 kbps
260 kbps

Table 3. AVI file with synthesized stereo pair from decoded video and depth maps, bitstreams and executable decoder for 2 view case.

Sequence avi bitstream
Poznan Hall 2 520 kbps
320 kbps
210 kbps
140 kbps
520 kbps
320 kbps
210 kbps
140 kbps
Poznan Street 1310 kbps
800 kbps
480 kbps
280 kbps
1310 kbps
800 kbps
480 kbps
280 kbps
Undo Dancer 1000 kbps
710 kbps
430 kbps
290 kbps
1000 kbps
710 kbps
430 kbps
290 kbps
GT Fly 1100 kbps
730 kbps
400 kbps
230 kbps
1100 kbps
730 kbps
400 kbps
230 kbps
Kendo 690 kbps
480 kbps
360 kbps
230 kbps
690 kbps
480 kbps
360 kbps
230 kbps
Balloons 800 kbps
520 kbps
350 kbps
250 kbps
800 kbps
520 kbps
350 kbps
250 kbps
Newspaper 720 kbps
480 kbps
360 kbps
230 kbps
720 kbps
480 kbps
360 kbps
230 kbps
Lovebird 1 830 kbps
480 kbps
300 kbps
220 kbps
830 kbps
480 kbps
300 kbps
220 kbps

References

  • [1] Marek Domanski, Tomasz Grajek, Damian Karwowski, Krzysztof Klimaszewski, Jacek Konieczny, Maciej Kurc, Adam Luczak, Robert Ratajczak, Jakub Siast, Olgierd Stankiewicz, Jakub Stankowski, Krzysztof Wegner

    „Technical Desciption of Poznan University of Technology proposal for Call on 3D Video Coding Technology”

    ISO/IEC JTC1/SC29/WG11, MPEG 2011 / M22697, Geneva, Switzerland, November 2011
  • [2] „Call for Proposals on 3D Video Coding Technology”

    ISO/IEC JTC1/SC29/WG11 MPEG2011/N12036, Geneva, Switzerland, March 2011
  • [3] Marek Domański, Jacek Konieczny, Maciej Kurc, Robert Ratajczak, Jakub Siast, Olgierd Stankiewicz, Jakub Stankowski, Krzysztof Wegner

    „3D video compression by coding of disoccluded regions”

    IEEE The International Conference on Image Processing (ICIP), Orlando, USA, 30 wrzesień – 3 październik 2012
  • [4] Marek Domański, Tomasz Grajek, Damian Karwowski, Krzysztof Klimaszewski, Jacek Konieczny, Maciej Kurc, Adam Łuczak, Robert Ratajczak, Jakub Siast, Olgierd Stankiewicz, Jakub Stankowski, Krzysztof Wegner

    „New Coding Technology For 3D Video With Depth Maps As Proposed For Standardization Within Mpeg”

    19th International Conference on Systems, Signals and Image Processing (IWSSIP), Vienna, Austria, 11-13 Kwiecień 2012
  • [5] Marek Domański, Tomasz Grajek, Damian Karwowski, Krzysztof Klimaszewski, Jacek Konieczny, Maciej Kurc, Adam Łuczak, Robert Ratajczak, Jakub Siast, Olgierd Stankiewicz, Jakub Stankowski, Krzysztof Wegner

    „Coding of Multiple Video+Depth Using HEVC Technology and Reduced Representations of Side Views and Depth Maps”

    Picture Coding Symposium, Kraków, Polska, 2012
  • [6] Filip Lewandowski, Mateusz Paluszkiewicz, Tomasz Grajek, Krzysztof Wegner

    „Subjective Quality Assessment Methodology for 3D Video Compression Technology”

    IEEE International Conference on Signals and Electronic Systems – ICSES 2012, Wrocław, Poland, Wrzesień 2012