Exploring Generative Image AI with Stable Diffusion

I'll be honest, while this may be the more trendy topics these days, I've always been an avid artist and with AI it seems like a great melting pot of art & technology. And I was really curious to figure out how some of the AI generated videos on social media were being made.

While there are several text-to-image & image-to-image services like Midjourney, Stability AI's Dream Studio and various other apps making the rounds of late, I was curious to figure out an open-source way (no limits and free!) and batch processing of frames that could help compile a video.

#The Tools

With that in mind we come across these two great repos:

##Stable Diffusion WebUI

This is an open-source version of Stability AI's stable diffusion along with a bunch of other contributions that lets us do image-to-image or text-to-image based image generation based on prompts.

##Stable Diffusion Deforum

Deforum is a plugin on top of Stable Diffusion that helps us generate batch images. You can have an init image and it diffuses the next frames based on your configuration and prompts.

If you're interested in the internals of Stable Diffusion, this article is a good read. In this post, I'm more interested in the setup and creative process!

#Hardware Requirements

I was quick to learn that while the setup of these tools locally may be fairly straightforward, you would certainly need a beefy system with some NVIDIA GPU to get decent results.

While I did some initial experiments with my PC which satisfied the requirements, I did come across a more convenient way of doing this via Notebooks: Google Colab for Stable Diffusion!

This notebook uses Google's backend and NVIDIA GPUs to help you run long-running workloads like ours.

#My Creative Experiment

My aim was to figure out a way to fuse AI frame generation (a.k.a. Deforum) with one of my original artworks as an init image.

I used one of my original digital artworks made on ProCreate on an iPad as the starting point.

#Configuration

Here's a great tutorial that takes you through the entire setup process. The camera configuration is the most important setting that helps direct the motion of the video:

def DeforumAnimArgs():
    # Animation Settings
    animation_mode = '3D'
    max_frames = 240
    border = 'wrap'
    
    # Motion Parameters
    angle = "0:(0)"
    zoom = "0:(0)"
    translation_x = "0:(0), 30:(15), 210:(15), 300:(0)"
    translation_y = "0:(0)"
    translation_z = "0:(0.2),60:(10),300:(15)"
    rotation_3d_x = "0:(0),60:(0),90:(0.5),180(0.5),300(0.5)"
    rotation_3d_y = "0:(0),30:(-3.5),90:(-2.5),180:(-2.8),300:(-2),420:(0)"
    rotation_3d_z = "0:(0),60:(0.2),90:(0),180:(-0.5),300:(0),420:(0.5),500(0.8)"
    
    # Coherence
    color_coherence = 'Match Frame 0 LAB'
    diffusion_cadence = '1'

And the image settings that impact the end result:

def DeforumArgs():
    # Image Settings
    W = 720
    H = 1280
    
    # Sampling Settings
    seed = -1
    sampler = 'dpmpp_2s_a'
    steps = 50
    scale = 7
    
    # Init Settings
    use_init = True
    strength = 0.6
    init_image = "/content/drive/MyDrive/AI/init-images/init-image.png"

#Stitching the Video

Once you have the frames generated in your Drive, it would be a PNG sequence that you can download. Then it's a matter of stitching it together in any video editing software. I used DaVinci Resolve (the free version!).

#Conclusion

This is definitely an interesting space with a lot of experimental value in how one can "create" art and content these days. I personally held a moral high-ground on how art needs to be unadulterated. But the tech-inclined side of me does find it fascinating.

The limitless possibilities of one's creative abilities is only limited by your imagination at this point! I for one, am keen on experimenting and adopting more AI in my creative journey.