Poznan 3D Codec

This page presents a technology for efficient coding of 3D video in Multi-view plus Depth (MVD) representation.

The new 3D video systems will include glassless displays and will provide realistic impression of depth as well as controllable stereoscopic base-line distance. In such 3D video systems the description of a 3D scene should be richer than just a stereo pair. Some applications will need many views to be available at the receiver. For example, future autostereoscopic displays are expected to present simultaneously even 50 different views corresponding to cameras with parallel optical axes equally spaced within an interval of order of the human inter-ocular distance (about 64 millimeters). Such dense spacing of the views yields strong similarity between the neighboring views that can exploited for compression. Moreover, in the receiver many virtual views may be efficiently synthesized using the Depth-Image-Based Rendering (DIBR) [3,6,19] and for transmission the MVD format often may be limited to only 2-3 views accompanied with the corresponding depth maps [1]. In a realistic example of a system with an autostereoscopic display only 3 views with 3 depth maps are transmitted (Fig. 1).

Fig. 1. An example of a 3D video system where 3 views with 3 depth maps are transmitted and used for synthesis of many virtual views.

The proposed technology is backward compatible with HEVC standard, so one of the views called base view can be decoded by a legacy HEVC decoder (video only). The remaining data (video and depth) can only be decoded by the 3D decoder, because additional syntax structures are used in the bitstream (fig. 2).
For both videos and depth maps hierarchical view coding structure similar to MVC is used: the already coded views are used as references for prediction of the subsequent views. There are three main inter-view prediction mechanizms used:

view synthesis prediction with Disoccluding Region Coding,
disparity-compensation prediction (MVC-like),
depth-base motion prediction (DBMP).

The main idea of the proposed coding technology is to exploit view-synthesis prediction as much as possible. The base view (HEVC-compatible view) and its depth are coded directly i.e. without any inter-view prediction. The side views (video and depths) are synthesized from the base view. Then, in the side views, disoccluded regions (hidden by the occlusion in the base view) are identified. Only the disoccluded regions from the side views are coded. Coding of the side views takes advantage of other inter-view prediction modes: disparity compensation and DBMP.
The cameras parameters are compressed and transmitted together with the videos and depth maps in a single bitstream.

Fig 2. Proposed codec structure for 3-view MVD.

The detailed technical description can be found in [1].

In october 2011, the codec has been submitted as a proposal to Call for Papers [2] issued by Motion Picture Experts Group (MPEG) on behalf of International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC). After the submission, all proposals have been extensively tested both subjectively and objectively and the proposal from Poznan University Technolgy has been qualified as one of the best performing.

Even after the resolution of CfP, we have continued to develop and assess our codec in finer details. The objective and subjective experiments that we have performed [3,4,5] proved the results attained by MPEG and also provided in-depth look into the performace of our technology.

The evaluation methodology, can be found in [2,6].

Table 1. Input views position and synthesized output view position for 2-view and 3-view configurations.

Sequence	2-view input	3-view input	Views to Synthesize from 2-view test scenario (and stereo pair)	Views to Synthesize from 3-view test scenario (and stereo pair)
Poznan_Hall2	7-6	7-6-5	6.5 (6.5-6)	6.125-5.875 (6.125-5.875)
Poznan_Street	4-3	5-4-3	3.5 (3.5-3)	4.125-3.875 (4.125-3.875)
Undo_Dancer	2-5	1-5-9	3 (3-5)	4.5-5.5 (4.5-5.5)
GT_Fly	5-2	9-5-1	4 (4-2)	5.5-4.5 (5.5-4.5)
Kendo	3-5	1-3-5	4 (4-5)	2.75-3.25 (2.75-3.25)
Balloons	3-5	1-3-5	4 (4-5)	2.75-3.25 (2.75-3.25)
Lovebird1	6-8	4-6-7	7 (7-8)	5.75-6.25 (5.75-6.25)
Newspaper	4-6	2-4-6	5 (5-6)	3.75-4.25 (3.75-4.25)

Below, it can find an results of coding with our technology for 2-view case and 3-view case for all 8 sequences defined in CfP.
The results include

stereoscopic (side by side) AVI files of 2 synthesized middle views
and bitstreams with executable decoder in RAR archive

Provided stereoscopic pair of views was synthesized based on decoded video and depth maps from provided bitstreams.
Exact position of the input and output view position can be found in table 1. Provided in AVI file view was subjectively evaluated during formal subjective evaluation of the proposals.
Downloaded AVI file can be viewer by a dedicated 3D software like Stereoscopic player.

Along bitstream and executable decoder, bath file is provided that can be used to decode the bitstreams. Executable decoder file is prepared for runing in windows 64 bit envirolement (other platforms executable decoder file can be obtain upon request). Decoder outputs reconstructed video and depth maps, along with camera parameters that can be used for view synthesis.

Fig 3. Synthesized stereopair of Poznan Street sequence in side by side format created based on 3 videos and 3 depth maps encoded at 1950 kbps.

Fig 4. Synthesized stereopair of Poznan Street sequence in side by side format created based on 3 videos and 3 depth maps encoded at 1180 kbps.

Fig 5. Synthesized stereopair of Poznan Street sequence in side by side format created based on 3 videos and 3 depth maps encoded at 710 kbps.

Fig 6. Synthesized stereopair of Poznan Street sequence in side by side format created based on 3 videos and 3 depth maps encoded at 410 kbps.

Table 2. AVI file with synthesized stereo pair from decoded video and depth maps, bitstreams and executable decoder for 3 view case.

Sequence	avi	bitstream
Poznan Hall 2	770 kbps 480 kbps 310 kbps 210 kbps	770 kbps 480 kbps 310 kbps 210 kbps
Poznan Street	1950 kbps 1180 kbps 710 kbps 410 kbps	1950 kbps 1180 kbps 710 kbps 410 kbps
Undo Dancer	2010 kbps 1200 kbps 780 kbps 430 kbps	2010 kbps 1200 kbps 780 kbps 430 kbps
GT Fly	1600 kbps 1080 kbps 600 kbps 340 kbps	1600 kbps 100 kbps 600 kbps 340 kbps
Kendo	1040 kbps 670 kbps 430 kbps 280 kbps	1040 kbps 670 kbps 430 kbps 280 kbps
Balloons	1200 kbps 770 kbps 480 kbps 300 kbps	1200 kbps 770 kbps 480 kbps 300 kbps
Newspaper	900 kbps 680 kbps 450 kbps 340 kbps	900 kbps 680 kbps 450 kbps 340 kbps
Lovebird 1	1270 kbps 730 kbps 420 kbps 260 kbps	1270 kbps 730 kbps 420 kbps 260 kbps

Table 3. AVI file with synthesized stereo pair from decoded video and depth maps, bitstreams and executable decoder for 2 view case.

Sequence	avi	bitstream
Poznan Hall 2	520 kbps 320 kbps 210 kbps 140 kbps	520 kbps 320 kbps 210 kbps 140 kbps
Poznan Street	1310 kbps 800 kbps 480 kbps 280 kbps	1310 kbps 800 kbps 480 kbps 280 kbps
Undo Dancer	1000 kbps 710 kbps 430 kbps 290 kbps	1000 kbps 710 kbps 430 kbps 290 kbps
GT Fly	1100 kbps 730 kbps 400 kbps 230 kbps	1100 kbps 730 kbps 400 kbps 230 kbps
Kendo	690 kbps 480 kbps 360 kbps 230 kbps	690 kbps 480 kbps 360 kbps 230 kbps
Balloons	800 kbps 520 kbps 350 kbps 250 kbps	800 kbps 520 kbps 350 kbps 250 kbps
Newspaper	720 kbps 480 kbps 360 kbps 230 kbps	720 kbps 480 kbps 360 kbps 230 kbps
Lovebird 1	830 kbps 480 kbps 300 kbps 220 kbps	830 kbps 480 kbps 300 kbps 220 kbps

References

[1] Marek Domanski, Tomasz Grajek, Damian Karwowski, Krzysztof Klimaszewski, Jacek Konieczny, Maciej Kurc, Adam Luczak, Robert Ratajczak, Jakub Siast, Olgierd Stankiewicz, Jakub Stankowski, Krzysztof Wegner

„Technical Desciption of Poznan University of Technology proposal for Call on 3D Video Coding Technology”

ISO/IEC JTC1/SC29/WG11, MPEG 2011 / M22697, Geneva, Switzerland, November 2011
[2] „Call for Proposals on 3D Video Coding Technology”

ISO/IEC JTC1/SC29/WG11 MPEG2011/N12036, Geneva, Switzerland, March 2011
[3] Marek Domański, Jacek Konieczny, Maciej Kurc, Robert Ratajczak, Jakub Siast, Olgierd Stankiewicz, Jakub Stankowski, Krzysztof Wegner

„3D video compression by coding of disoccluded regions”

IEEE The International Conference on Image Processing (ICIP), Orlando, USA, 30 wrzesień – 3 październik 2012
[4] Marek Domański, Tomasz Grajek, Damian Karwowski, Krzysztof Klimaszewski, Jacek Konieczny, Maciej Kurc, Adam Łuczak, Robert Ratajczak, Jakub Siast, Olgierd Stankiewicz, Jakub Stankowski, Krzysztof Wegner

„New Coding Technology For 3D Video With Depth Maps As Proposed For Standardization Within Mpeg”

19th International Conference on Systems, Signals and Image Processing (IWSSIP), Vienna, Austria, 11-13 Kwiecień 2012
[5] Marek Domański, Tomasz Grajek, Damian Karwowski, Krzysztof Klimaszewski, Jacek Konieczny, Maciej Kurc, Adam Łuczak, Robert Ratajczak, Jakub Siast, Olgierd Stankiewicz, Jakub Stankowski, Krzysztof Wegner

„Coding of Multiple Video+Depth Using HEVC Technology and Reduced Representations of Side Views and Depth Maps”

Picture Coding Symposium, Kraków, Polska, 2012
[6] Filip Lewandowski, Mateusz Paluszkiewicz, Tomasz Grajek, Krzysztof Wegner

„Subjective Quality Assessment Methodology for 3D Video Compression Technology”

IEEE International Conference on Signals and Electronic Systems – ICSES 2012, Wrocław, Poland, Wrzesień 2012

More posts

Wyniki kolokwium z Teorii Sygnałów

Wyniki kolokwium z Teorii Sygnałów

Renderowanie Muchy w OpenGL

Przygładowy projekt z OpenGL (tekstury)