Intro↑§
CLIP is a natural language processing model that can learn visual concepts & categorize images (e.g. identifying cats & dogs) from unfiltered & very noisy data, and it can basically turn images into text by automatically describing them or match search queries to a database of images.
And then this person, decided to use CLIP to train the SIREN network (which uh, is way too high-level for me to understand) to generate images that match a given description, which we now know as Deep Daze.
Aleph2Image is more recent attempt at this sort of text-to-image generation that uses parts of DALL-E as the generator in conjunction with CLIP.
The plan for this post is pretty much to just run tests of whatever comes into my mind & explore the limitations & possibilities of Aleph2Image.
Prompt: a neon city at night↑§
A decent first test.
Prompt: a cloud of smog painted on a canvas↑§
It's more of a blob but sure whatever.
Prompt: a solarpunk warship↑§
You can clearly make out the warship, but it looks like it's being attacked by a solar flare.
Prompt: a rainy cobblestone street↑§
I think it tried to make cobblestone rain onto a street, but I'm not entirely sure.
Prompt: a cat wearing a birthday hat↑§
This is just an undulating mass of cat flesh.
Prompt: a bee listening to jazz↑§
Jazz is a gateway to many abilities some consider to be unnatural.
Prompt: a demonic symbol in the sky revealing hell↑§
It looks a bit like the Reddit snoo character.
Prompt: a cafe in a monsoon↑§
I let this run for a bit and when I came back I forgot what the prompt was and had no idea what I was looking at.
Prompt: an anime girl made out of garlic bread↑§
V O I D G A R L I C.
Conclusion↑§
"I was afraid that not a single thing on earth would ever again surprise me", Jorge Luis Borges
"Uuooooooaaaahhhhohhhhhhhh okay", Vinny
I'm pretty excited to see what this will evolve into. For now though, I don't see a use for any of the images I got other than maybe inspiration.
Special Thanks↑§
@advadnoun was the creator of the Google Colab that made all of this possible. You can donate to him via Venmo at "@rynnn".
Intro II↑§
Okay so I found this Google Colab for Dall-E Mini, so let's do this again and see what happens.
Prompt: a neon city at night↑§
Okay, this looks a lot cooler than the last one--Even if you can't really see any detail whatsoever.
Prompt: a cloud of smog painted on a canvas↑§
This looks like some sort of abstract art--Pretty good!
Prompt: a solarpunk warship↑§
It's really easy to see the warship & water, this is really cool.
Prompt: a rainy cobblestone street↑§
It doesn't look rainy, but there is very clearly a cobblestone street & buildings. I'm also digging the painterly style.
Prompt: a cat wearing a birthday hat↑§
You can barely make out the cat & even a bit of the birthday hat. Looks like pretty good abstract art.
Prompt: a bee listening to jazz↑§
It correctly got the low DOF effect from having a camera focused on something really small, but not the bee unfortunately.
Prompt: a demonic symbol in the sky revealing hell↑§
I can't make heads or tails of this one.
Prompt: a cafe in a monsoon↑§
Now this is pretty fucking sick. A lot of these images could look orders of magnitudes better with human manipulation.
Prompt: an anime girl made out of garlic bread↑§
Looks like a character that would come out of an obscure RPG Maker game lmao. These images are giving me so many ideas.
Prompt: a euclidean bedroom↑§
It looks more like a bathroom.
Prompt: a non-euclidean bedroom↑§
Ironically, this one looks more like a room than the previous one.
Prompt: a lavish hotel lobby↑§
I swear this looks more like a bathroom than a hotel lobby.
Prompt: a cute anime girl↑§
Holy shit. Hoooly fucking shit. Another character that would fit seamlessly into an obscure RPG Maker game.
Prompt: a cute anime boy↑§
This is like an alternate reality fever dream version of Avagado6.
Prompt: a kobold in a hoodie↑§
Okay, so it took the mythological interpretation of the kobold instead of the furry version.
Prompt: a cute kobold in a hoodie↑§
Some issue as the previous attempt.
Prompt: a redditor↑§
I can't make out anything in this one.
Prompt: a sign that says, "ybubbus"↑§
I wasn't expecting it work, and it didn't, but I think it tried to make a storefront.
Prompt: an isometric view of a pixelated car↑§
Isometric Pablo Picasso.
Prompt: the Notre Dame made of human flesh↑§
I guess that might be the Notre Dame.
Prompt: a bottle of water↑§
I was not expecting such an abstract image to come out of this prompt.
Prompt: a violent bottle of water↑§
Not only is this one more recognizable as a bottle of water than the previous prompt, you can even see it trying to replicate the Shutterstock watermark.
Prompt: Francis Bacon in the style of Francis Bacon↑§
Okay, wow. That actually looks like a Francis Bacon piece.
Prompt: Francis Bacon in the style of Francis Bacon in the style of Francis Bacon↑§
Obviously a later Francis Bacon piece.
Prompt: a Pikachu poster↑§
That's... not a Pikachu.
Prompt: Jim Carrey is an anti-vaxxer↑§
Looks like we caught the bastard in the middle of some shape-shifting.
Prompt: Joe Biden's America↑§
Literally Hide the Pain Harold.
Prompt: Donald Trump's America↑§
Literally a Jellyfish.
Prompt: Barack Obama's America↑§
Literally Boris Johnson.
Prompt: banana↑§
A for effort.
Conclusion II↑§
Dall-E Mini is much, much faster than Aleph2Image--At the expense of resolution. But that's a fair trade-off, and what I want to do now is see if I can modify these images to be better.
Special Thanks II↑§
The Dall-E Mini colab is used was made by "mega b#6696" on Discord and was incredibly easy to use.
Addendum: Batch-editing Dall-E Mini Output↑§
Here's all of the previous Dall-E images you saw run through various filters:
Conclusion III↑§
This is pretty fun.