Stable Diffusion - Introduction and a tutorial for using the models

by. Jongwon Lee | 69 Views (8 Uniq Views) | 25 days ago
#ComputerVision #DeepLearning #Anythingv3
Stable Diffusion on Google Colab, Apple Silicon using models such as Anythingv3 and ACertainThing
- Introduction to Stable Diffusion
Diffusion models work by a method of adding noise and then learning to recover the data by reversing this adding-noise process. It will repeat this process around 1000 times per training data.

Latent Diffusion Model converts images to numerics before training. This new approach saved time and contributed to improving model performance with the same amount of resources.

Stable Diffusion Model is created by a company called Stability.ai. They trained Latent Diffusion Model using a high-quality data set called LAION. Although they spent so much (funded) money on this project, they provided Stable Diffusion as open source which can be used commercially as well. Moreover, individuals can add more image data to Stable Diffusion and this can modify the style and genre. This is why the community is growing exponentially.

- Usage
An image like this one with a girl and a piano can be used as a thumbnail for a YouTube animation piano cover. You can also draw a more abstract "Art" and maybe hang it in your room. Artists, illustrators, and creators can get the idea by generating art with the topics and prompts they are working on. Stable Diffusion model is still not perfect. In the image with a girl and a piano, the model placed the piano backward. Also, a lot of times, humans end up having non-5 fingers. Human illustrators can now fix these details to use the images for commercial purposes.

In this article, I am not going to talk about training models but this tutorial will let you use pre-trained models and generate your image and store them efficiently.

- Colab Code (https://colab.research.google.com/drive/1llzrCkKY0OIGpZGctFW-F8kMgExlJgVu?usp=sharing)
The easiest and most efficient way to use Stable Diffusion is Google Colab because it provides free GPU (cuda in this case). Colab limits the daily usage to around 100 images but you can try more tomorrow!

!pip install --upgrade -qq git+https://github.com/huggingface/diffusers.git transformers accelerate scipy xformers gradio translate
from diffusers import StableDiffusionPipeline
import torch
from datetime import datetime
from pytz import timezone
from google.colab import files

You can change models, branches, schedulers, etc. here.
Setting the safety checker to none will give you more freedom since a lot of outputs will be censored even though they are safe for work.

model_id = "Linaqruf/anything-v3.0" #a model from local folder or on https://huggingface.co/models ex) JosephusCheung/ACertainThing
pipe = StableDiffusionPipeline.from_pretrained(model_id).to("cuda")
pipe.safety_checker = None

Prompt list and negative prompt list can be anything you imagine although the model may not be good at working with certain prompts.
You can help the model with more detailed prompts or by modifying steps or guidance.
Generally, increasing steps will improve the quality but will consume more resources.
A higher guidance scale will make the model strictly follow your prompts, but this may cause the output to be unnatural and low-quality.

prompt_list = [
    '1girl', ##main subject #scenery if you want 
    'pink eyes','beautiful eyes','pink hair', 'tall', ##style
    'holding orb', ##props
    'park', 'summer', ##where
    'sitting on chair', ##posture
    'curious', ##adjectives
    'black dress', ##clothes
    'masterpiece','best quality','CG','wallpaper','HDR','high quality','high definition', 
]

negative_prompt_list = [
    'lowres','(bad anatomy, bad hands:1.1)','text','error','missing fingers',
    'extra digit','fewer digits','cropped','worst quality','low quality','normal quality',
    'jpeg artifacts','signature','watermark','username','blurry',
    'artist name','b&w','weird colors','(cartoon, 3d, bad art, poorly drawn, close up, blurry:1.5)',
    '(disfigured, deformed, extra limbs:1.5)','missing fingers'
]

prompt = ', '.join(prompt_list)
negative_prompt = ', '.join(negative_prompt_list)

num_inference_steps = 60
guidance_scale = 15
width = 504
height = 896

Colab's one flaw is that they would not let you keep the files when you are disconnected.
This trick will save the files on Colab's /content directory and also save them locally.
You can set the range to 50-100 and review them after eating.

for i in range(2): #use longer range to generate and save more images
    image = pipe(
        prompt,
        negative_prompt = negative_prompt,
        num_inference_steps = num_inference_steps,
        guidance_scale = guidance_scale,
        width = width,
        height = height,
    ).images[0]
    filename = '/content/image_{}.png'.format(datetime.now(timezone('US/Eastern')).strftime("%Y%m%d-%H%M%S"))
    image.save(filename)
    files.download(filename) #use chrome, allow google to download multiple files

Now go check the outputs, modify prompts and run again.

- Apple Silicon Code

I am on M1Pro but considering the performance and the time it takes, I will rather use Colab.
However, for lighter models and easier prompts to draw, this works fine.
Most importantly, achieving this locally is fulfilling and also it works offline.

Assuming you have jupyter notebook and conda set up, on terminal,

$ conda create -n torch-nightly python=3.8 
$ conda activate torch-nightly
$ pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
$ pip install -U diffusers

import os
from datetime import datetime
from pytz import timezone
import torch
from diffusers import StableDiffusionPipeline

device = 'cuda' if torch.cuda.is_available() else 'mps' #This is doing the trick for Apple Silicon
print(f'running on {device}')
pipe = StableDiffusionPipeline.from_pretrained(
    "Linaqruf/anything-v3.0", #a model from local folder or on https://huggingface.co/models ex) JosephusCheung/ACertainThing
    cache_dir=os.getenv("cache_dir", "./models")
).to(device)
pipe.enable_attention_slicing() # Recommended if your computer has < 64 GB of RAM
pipe.safety_checker = None

prompt_list = [
    'scenery','shibuya tokyo','((post-apocalypse))','ruins','rust','sky',
    '(skyscraper)','abandoned','blue sky','broken window', 'building',
    'cloud','crane machine','outdoors','overgrown','pillar','sunset',
    'masterpiece','best quality','CG','wallpaper','HDR','high quality','high definition', 
]

negative_prompt_list = [
    'lowres','(bad anatomy, bad hands:1.1)','text','error','missing fingers',
    'extra digit','fewer digits','cropped','worst quality','low quality','normal quality',
    'jpeg artifacts','signature','watermark','username','blurry',
    'artist name','b&w','weird colors','(cartoon, 3d, bad art, poorly drawn, close up, blurry:1.5)'
]

prompt = ', '.join(prompt_list)
negative_prompt = ', '.join(negative_prompt_list)

num_inference_steps = 25
guidance_scale = 7.5
width = 504
height = 896

for i in range(1): #use longer range to generate and save more images
    image = pipe(
        prompt,
        negative_prompt = negative_prompt,
        num_inference_steps = num_inference_steps,
        guidance_scale = guidance_scale,
        width = width,
        height = height,
    ).images[0]
    filename = './image_{}.png'.format(datetime.now(timezone('US/Eastern')).strftime("%Y%m%d-%H%M%S"))
    image.save(filename)

If you are interested in more details huggingface well documented most of the things like this.
https://huggingface.co/docs/diffusers/main/en/using-diffusers/schedulers