New advanced AI image tools
New advanced AI image tools
May 31, 2025

We now give you more paths to go from idea to image in PictureStudio!
Advanced Descriptive Models
New advanced models allow you to work with complex detailed prompts. Last week, we integrated Google's cutting-edge image model, Imagen 4. We also enhanced our Imagen 3 capabilities with exciting new features. Both of these Google models excel at understanding highly complex prompts and work beautifully with our existing styles.
Google's flagship model, Imagen 4, shines when it comes to handling intricate prompts. We love how Imagen 4 stays faithful to your written instructions—it won't invent scenes or add elements you didn't ask for. This encourages more precise prompt writing, a good practice which we think leads to better results overall. While the Imagen 4 model doesn't handle reference images in prompts yet, we're expecting that capability to arrive soon.
We've also given Imagen 3 major upgrades in the image-handling department. You can now add reference images to your Imagen 3 prompts and use style transfer images with this model. Coming up next with Imagen 3: control images that let you transfer subjects, products, and compositions between images.
Multi-Modal Models
Last week we also added multi-modal models that can understand both instructional and descriptive prompts. We've started with GPT-4o and Gemini, with more multi-models coming soon.
These multi-models open up incredible possibilities. You can upload a sketch and ask the multimodal model to complete it or to transform it.
For the three images below we started with a somewhat blurry snapshot of a sketch of a face on a post-it note and transformed the image through the prompt (these examples are available to inspect in our Example Projects).
The transformations are endless, you could upload a photograph of someone and ask the model to create a professional corporate headshot of that person, feature them as a character in a children's book illustration, or render them as the cast them in a sci-fi TV series.
While older models like SDXL have many methods for transforming images, these multi-modal models make it simple—just drag an image into your prompt and tell the model what you'd like it to do. Keep in mind that these models don't work with the actual pixels of your image. Instead, they "understand" the image and recreate it using their training. This means these techniques work best when you're looking to transform the person or subject in some meaningful way. GPT-4o is particularly impressive at these transformations, though it does come with a higher cost and slightly longer processing times.
And here's a bonus: these multi-modal models can also combine multiple images instructionally, giving you even more creative control over your final result.