Physics Based Image Deshadowing Using Local Linear Model

Tamir Einy^1*, Efrat Immer^2*, Gilad Vered¹, Shai Avidan²

Applied Materials¹, Tel-Aviv University²

{tamireiny, efratimmer, vgilad}@gmail.com, avidan@eng.tau.ac.il

*Denotes equal contribution

Abstract

Image deshadowing algorithms remove shadows from images. This requires both detecting where the shadow is and, once detected, removing it from the image. This work focuses on the shadow removal part. We follow a common physical shadow formation model and learn its parameters using a deep neural network. The problem is challenging because the model assumes six parameters per pixel, which makes the problem ill-posed. Common deshadowing methods use neural networks to estimate a simplified model that requires only three parameters per pixel. This leads to good results, but close inspection on real images reveals its limitations. In contrast, we estimate the full model (i.e., six parameters per pixel) directly. Remarkably, our deshadowing network is considerably smaller, compared to alternative methods, while producing results that are better, on standard datasets.

Estimated shadow coefficients maps

We show an example for our network estimated shadow coefficients maps w and b for a single color channel (red) in the figure below. As can be seen, the coefficients maps are piecewise smooth and the values of w and b depend on the pixel's color and spatial location. For example, higher values of w can be found at the upper shadow part close to the occluding figure, where the shadow intensity is greater.

Shadow Masks Fine-tuning

We estimate the shadow masks using the shadow detection network BDRAR developed by Zhu et al [1]. We fine-tuned the original model on the ISTD+ train set and evaluated the results with IoU and BER (balance error rate). Using fine-tuning, we managed to increase the mean IoU, on the ISTD+ test set, from 0.794 to 0.91. At the same time, the BER dropped from 5.61 to 1.94. We present below an example for the shadow mask estimation before and after fine-tuning. For this image, after the fine-tuning, the IoU increased from 0.06 to 0.95 and the BER dropped from 11.4 to 0.75.

[1] Lei Zhu, Zijun Deng, Xiaowei Hu, Chi-Wing Fu, Xuemiao Xu, Jing Qin and Pheng-Ann Heng. Bidirectional Feature Pyramid Network with Recurrent Attention Residual Modules for Shadow Detection

Network Architecture

Our network is based on the multi-scale context aggregation network (CAN32 configuration) developed by Chen et al [1] in the context of semantic image analysis. The network is a fully-convolutional network with 10 layers. Except for the last layer, each convolutional layer follows with an adaptive normalization and a leaky ReLU layer. The parameters of the convolutions are summarized in the table below. As can be seen, the network contains convolutions with a growing dilation rate. The usage of dilation allows the network to process information from a large receptive field (513 x 513), while using a small number of parameters. The input to the network is a 4-channel tensor containing the shadow image and the shadow mask. The output is a 6-channel tensor containing the shadow coefficients maps with the same resolution as the input.

[1] Qifeng Chen, Jia Xu, and Vladlen Koltun. Fast image processing with fully-convolutional networks

Layer	Convolution	Dilation	Receptive Field
1	3 X 3	1	3 X 3
2	3 X 3	2	7 X 7
3	3 X 3	4	15 X 15
4	3 X 3	8	31 X 31
5	3 X 3	16	63 X 63
6	3 X 3	32	127 X 127
7	3 X 3	64	255 X 255
8	3 X 3	128	511 X 511
9	3 X 3	1	513 X 513
10	1 X 1	1	513 X 513

Layer

Convolution

Dilation

Receptive Field

3 X 3

7 X 7

3 X 3

15 X 15

3 X 3

31 X 31

3 X 3

63 X 63

3 X 3

127 X 127

3 X 3

255 X 255

3 X 3

128

511 X 511

3 X 3

513 X 513

1 X 1

513 X 513

Physics Based Image Deshadowing Using Local Linear Model

Abstract

Estimated shadow coefficients maps

Shadow Masks Fine-tuning

Network Architecture

ISTD+ Qualitative Examples

{{ example_section.example_title }}