Beyond Ghiblized images, Introducing ChatGPT 4o

By Raul Gutierrez • March 29, 2025


You might have heard a loud sonic boom richocheting around the AI image-generation world this week, after OpenAI opened access to a new highly capable image capabilities of the awkwardly named ChatGPT 4o. For many people, this was the first time they had used an image model, and if frequency of memes per second is any indication, it's the first time AI image making has started to break through to the general public (Apparently, countless people were waiting to be Ghilbified, Miyazaki's horror be damned). While much of what ChatGPT 4o does is possible with other models, OpenAI's model is no-fuss. It's very good with one-shot prompts, so people have good experiences right away. This is a moment we have long anticipated and is one of the reasons we built PictureStudio. OpenAI's image model will be available as part of PictureStudio soon (OpenAI promises will be available for developers in a few weeks). And beyond the memes, it will open up a new set of amazing possibilities for pro users. Our goal is to help pros harness these tools giving them consistent interface that will stay stable and predictable as the underlying tech changes.

a progression of 3 images of a person going from very flat to painted.

Let's back up to discuss what this model is, why it's important, and how it fits into our toolset. The OpenAI model is multimodal (meaning it was trained across text, image, and audio). It can see, talk, and reason about images in real-time. Put simply, it has a complex understanding of the real world as presented through text and image. OpenAI writes, "It can follow detailed instructions, including reliably incorporating text into images. And because it is embedded natively, deep in the architecture of our omnimodal GPT-4o model, 4o image generation can use everything it knows to apply these capabilities in subtle and expressive ways, creating images that are not only beautiful, but also useful."
Put succinctly, images generated with 4o are much less likely to have some of the issues common with previous prediction models. This new model shines when creating images from complex prompts, when editing images with simple commands, or when using an existing image as a base and recasting the results in another style. The example below is a one-shot recreation of Manet's Un Bar aux Folies-Bergère. We use this as a test image because many of the details (the reflections for example) are hard to get right in a transformation. And while there's still room to improve, this is a major step change quality-wise.

Manet's Un Bar aux Folies-Bergère and rendered as an image

There are a some tradeoffs. These types of models tend to be slow. The styles they produce can be uniform (limited by the internal representation of style). So for example, if I use the word 'comic' and you use the word 'comic', we'll produce images of an almost identical style. Steering results away from those styles using only ChatGPT 4o can be difficult or confusing. These are just a few of the problems we're trying to solve.

Internal representation can also enhance perceptions of bias or stereotyping. Request an image of a Dutch woman on a country road, and she might be pictured wearing a 17th century bonnet and surrounded by windmills. You can steer away images from these stereotypes, but it helps to understand what is happening under the hood.

The more these models understand about reality, the harder it can be for them to break away from reality, so for people who work in non-representational styles this type of model might be less flexible than older-style models. Conversational editing isn't for everyone. This is a biggie—specially for visual people, sometimes explaining what you want to do with words is harder than just using visual tools or drawing something. Finally, results are (currently) not deterministic, meaning they are hard to repeat, which is important for many types of artistic workflows.

These drawbacks, while real, shouldn't undercut the importance of this model. It will allow many people to go from idea to image faster than they ever thought possible.

Our approach will allow for access to these new types of models side by side with older models, giving creative people the ability to use the best of whatever tools are available. We believe this is just the first on what will be a small tidal wave of advanced models coming out over the next few months and giving our users coherent access to them is our North Star.

A few practical examples of how models might be used in concert with each other: Multimodal models are great at taking an image and reposing the subject. But if the pose changes too much, the likeness of the person is reduced. In PictureStudio, you might position an image with a multimodal model and then "repaint" the person with a model of that person to bring the likeness back. In the example below, we're trying to recast the vintage poster in the background of the original image (first image) without losing the my likeness. You can do this in PictureStudio today, but the background isn't precise (the 2nd image). ChatGPT 4o creates a nice set of tropical plants in the background, and otherwise matches the image, but it subtly changes my face. Also the plant labels are off. (3rd image).

Face progression

Once we bring 4o into the platform, you'll be able to create workflows to easily fix these problems allowing for a final step that marries the best of both worlds.

Another example, say you want to create panels for a comic book with text, and you love the way OpenAI's model can position characters and produce text, but you don't like that all of OpenAI's comic output falls into a narrow and specific style. You could create the comic with OpenAI and restyle and recast it using another model with a more expressive and unique style.

Or let's say that you've been able to create a unique character using a mix of techniques in Flux or Imagen and want to reposition it with OpenAI. This workflow will become easy in PictureStudio.

We expect that you're going to see a lot of movement in this space and that ChatGPT 4o is just the first in a wave of impressive image models. We're committed to building the tools to help you harness them for professional workflows and image pipelines. Stay tuned!