Cross-Domain Conditional Generative Adversarial Networks for Stereoscopic Hyperrealism in Surgical Training

Engelhardt, Sandy; Sharan, Lalith; Karck, Matthias; De Simone, Raffaele; Wolf, Ivo

by Sandy Engelhardt, Lalith Sharan, Matthias Karck, Raffaele De Simone, Ivo Wolf

Abstract:

Phantoms for surgical training are able to mimic cutting and suturing properties and patient-individual shape of organs, but lack a realistic visual appearance that captures the heterogeneity of surgical scenes. In order to overcome this in endoscopic approaches, hyperrealistic concepts have been proposed to be used in an augmented reality-setting, which are based on deep image-to-image transformation methods. Such concepts are able to generate realistic representations of phantoms learned from real intraoperative endoscopic sequences. Conditioned on frames from the surgical training process, the learned models are able to generate impressive results by transforming unrealistic parts of the image (e.g.\textbackslashtextbackslashtextbackslash the uniform phantom texture is replaced by the more heterogeneous texture of the tissue). Image-to-image synthesis usually learns a mapping \$G:X\textbackslashtextbackslashtextasciitilde\textbackslashtextbackslashtextbackslashto\textbackslashtextbackslashtextasciitildeY\$ such that the distribution of images from \$G(X)\$ is indistinguishable from the distribution \$Y\$. However, it does not necessarily force the generated images to be consistent and without artifacts. In the endoscopic image domain this can affect depth cues and stereo consistency of a stereo image pair, which ultimately impairs surgical vision. We propose a cross-domain conditional generative adversarial network approach (GAN) that aims to generate more consistent stereo pairs. The results show substantial improvements in depth perception and realism evaluated by 3 domain experts and 3 medical students on a 3D monitor over the baseline method. In 84 of 90 instances our proposed method was preferred or rated equal to the baseline.

Reference:

Cross-Domain Conditional Generative Adversarial Networks for Stereoscopic Hyperrealism in Surgical Training (Sandy Engelhardt, Lalith Sharan, Matthias Karck, Raffaele De Simone, Ivo Wolf), In MICCAI 2019: 22nd International Conference on Medical Image Computing and Computer Assisted Intervention, 2019.

Bibtex Entry:

@inproceedings{engelhardt_cross-domain_2019,
	address = {Shenzhen, China},
	title = {Cross-{Domain} {Conditional} {Generative} {Adversarial} {Networks} for {Stereoscopic} {Hyperrealism} in {Surgical} {Training}},
	abstract = {Phantoms for surgical training are able to mimic cutting and suturing properties and patient-individual shape of organs, but lack a realistic visual appearance that captures the heterogeneity of surgical scenes. In order to overcome this in endoscopic approaches, hyperrealistic concepts have been proposed to be used in an augmented reality-setting, which are based on deep image-to-image transformation methods. Such concepts are able to generate realistic representations of phantoms learned from real intraoperative endoscopic sequences. Conditioned on frames from the surgical training process, the learned models are able to generate impressive results by transforming unrealistic parts of the image (e.g.{\textbackslash}textbackslashtextbackslash the uniform phantom texture is replaced by the more heterogeneous texture of the tissue). Image-to-image synthesis usually learns a mapping \$G:X{\textbackslash}textbackslashtextasciitilde{\textbackslash}textbackslashtextbackslashto{\textbackslash}textbackslashtextasciitildeY\$ such that the distribution of images from \$G(X)\$ is indistinguishable from the distribution \$Y\$. However, it does not necessarily force the generated images to be consistent and without artifacts. In the endoscopic image domain this can affect depth cues and stereo consistency of a stereo image pair, which ultimately impairs surgical vision. We propose a cross-domain conditional generative adversarial network approach (GAN) that aims to generate more consistent stereo pairs. The results show substantial improvements in depth perception and realism evaluated by 3 domain experts and 3 medical students on a 3D monitor over the baseline method. In 84 of 90 instances our proposed method was preferred or rated equal to the baseline.},
	booktitle = {{MICCAI} 2019: 22nd {International} {Conference} on {Medical} {Image} {Computing} and {Computer} {Assisted} {Intervention}},
	author = {Engelhardt, Sandy and Sharan, Lalith and Karck, Matthias and De Simone, Raffaele and Wolf, Ivo},
	month = oct,
	year = {2019},
	keywords = {Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computers and Society, Electrical Engineering and Systems Science - Image and Video Processing},
	pages = {arXiv:1906.10011}
}