AnimeDiffusion: Anime Diffusion Colorization

Yu Cao†,1 Xiangqiao Meng†,1 P. Y. Mok1   Tong-Yee Lee 2  Xueting Liu3   Ping Li 1 
1 The Hong Kong Polytechnic University, Hong Kong
2 National Cheng Kung University, Taiwan
3 Saint Francis University, Hong Kong
Indicates Equal Contribution

Video Presentation

Abstract

Being essential in animation creation, colorizing anime line drawings is usually a tedious and time-consuming manual task. Reference-based line drawing colorization provides an intuitive way to automatically colorize target line drawings using reference images. The prevailing approaches are based on generative adversarial networks (GANs), yet these methods still cannot generate high-quality results comparable to manually-colored ones. In this paper, a new AnimeDiffusion approach is proposed via hybrid diffusions for the automatic colorization of anime face line drawings. This is the first attempt to utilize the diffusion model for reference-based colorization, which demands a high level of control over the image synthesis process. To do so, a hybrid end-to-end training strategy is designed, including phase 1 for training diffusion model with classifier-free guidance and phase 2 for efficiently updating color tone with a target reference colored image. The model learns denoising and structure-capturing ability in phase 1, and in phase 2, the model learns more accurate color information. Utilizing our hybrid training strategy, the network convergence speed is accelerated, and the colorization performance is improved. Our AnimeDiffusion generates colorization results with semantic correspondence and color consistency. In addition, the model has a certain generalization performance for line drawings of different line styles. To train and evaluate colorization methods, an anime face line drawing colorization benchmark dataset, containing 31,696 training data and 579 testing data, is introduced and shared. Extensive experiments and user studies have demonstrated that our proposed AnimeDiffusion outperforms state-of-the-art GAN-based methods and another diffusion-based model, both quantitatively and qualitatively.

Dataset

Pipeline

Qualitative Evaluation

Qualitative comparison for anime face with homochromatic and heterchromatic pupils. (a) reference images, (b) line drawings, (c) Lee et al., (d) Li et al., (e) Cao et al., (f) Xu et al., and (g) AnimeDiffusion.