Traditional Classification Neural Networks are Good Generators: They are Competitive with DDPMs and GANs

Technical Report

Guangrun Wang, Philip H.S. Torr
TVG of University of Oxford

Our Result Overview

Abstract

Classifiers and generators have long been separated. We break down this separation and showcase that conventional neural network classifiers can generate high-quality images of a large number of categories, being comparable to the state-of-the-art generative models (e.g., DDPMs and GANs). We achieve this by computing the partial derivative of the classification loss function with respect to the input to optimize the input to produce an image. Since it is widely known that directly optimizing the inputs is similar to targeted adversarial attacks incapable of generating human-meaningful images, we propose a mask-based stochastic reconstruction module to make the gradients semantic-aware to synthesize plausible images. We further propose a progressive-resolution technique to guarantee fidelity, which produces photorealistic images. Furthermore, we introduce a distance metric loss and a non-trivial distribution loss to ensure classification neural networks can synthesize diverse and high-fidelity images. Using traditional neural network classifiers, we can generate good-quality images of 256$\times$256 resolution on ImageNet. Intriguingly, our method is also applicable to text-to-image generation by regarding image-text foundation models as generalized classifiers.

Proving that classifiers have learned the data distribution and are ready for image generation has far-reaching implications, for classifiers are much easier to train than generative models like DDPMs and GANs. We don't even need to train classification models because tons of public ones are available for download. Also, this holds great potential for the interpretability and robustness of classifiers.

AI Generator and Problems

Classifier as Visualizer and Problems

Framework

Progressive Resolution

Distance Metric or Distribution Loss

Distance Metric or Distribution Loss

Text-to-Image Generation

Text-to-Image Generation

Sampling

Comparison with GAN, DDPM, and DeepDream

Samples of CaG

Samples of CaG

Samples of CaG

Samples of CaG

Samples of CaG

Samples of CaG

Samples of CaG

Samples of CaG

Samples of CaG

Samples of CaG

Samples of CaG

Text-to-Image: “an illustration of a baby daikon radish in a tutu walking a dog”

Text-to-Image: “an armchair in the shape of an avocado”

Text-to-Image: “a photo of a phone from the 20s”

Text-to-Image: “An astronaut playing basketball with cats in space as a children book illustration”

More Text-to-Image Samples

Ablation Studies

Related Links

Goodfellow et al., Generative adversarial networks, 2014.

StyleLight: HDR Panorama Generation for Lighting Estimation and Editing, 2022.

Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis, 2022.

Text2Light: Zero-Shot Text-Driven HDR Panorama Generation, 2022.

Dhariwal et al., Diffusion Models Beat GANs on Image Synthesis, NeurIPS 2021.

Ho et al., Denoising Diffusion Probabilistic Models, NeurIPS 2020.

Nichol et al., Improved Denoising Diffusion Probabilistic Models, ICML 2021.

Olah et al., Feature visualization, Distill 2017.

Mahendran et al., Understanding deep image representations by inverting them, CVPR 2015.

Mahendran et al., Deepdream with tensorflow, 2016.

Nguyen et al., Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images, CVPR 2015.

Nguyen et al., Visualizing googlenet classes, 2015.

BibTeX


      @article{wang2022cag,
      title={Traditional Classification Neural Networks are Good Generators: They are Competitive with DDPMs and GANs},
      author={Wang, Guangrun and Torr, Philip HS},
      journal={arXiv preprint arXiv:2211.14794},
      year={2022}
      }
      

Recommendation

A highly recommended project: Traditional Classification Neural Networks are Good Generators: They are Competitive with DDPMs and GANs.