Business has undergone some vast changes in the past 30 years; the Internet itself changed how we communicate with each other and how we run our businesses. Smartphones give us access to the whole world in our hands. We now have access to laser-targeting when marketing to individuals, and more recently, social media has changed how we sell. Today we stand at the edge of the next and possibly greatest evolution: AI.
I’m sure you can find a lot of valuable and varied information about this phenomenon with a simple web search, so we are not going to add to that noise. Instead, we are going to look at today, this moment in time, and how AI can actually be leveraged in how we work right now. Specifically for the creation of an animated explainer video. Will it make our lives easier? Could it save us time and money? Could our clients do it themselves, and so are we all now out of a job?
Here at Explainify, we have a five-step process when creating a video, so we are going to lean on those steps and see whether there’s an AI tool out there that does what we do already.
Step one – Research & Ideation
Using AI to develop ideas and research is currently possible with quite a few language ai models. The current forerunners as of May 23 are ChatGPT (which is the ‘original’ and from which most of the others are based), Bing Chat, YouChat, (which are both internet linked so can give more up-to-date responses than ChatGPT), and Jasper (best for business and marketing, but costs money). There are many others out, and no doubt many more will emerge each day going forward, but one or more of the ones listed above are a great place to start for now.
At this time, Explainify is using ChatGPT (with a WebChatGPT browser extension that links it to the internet) to do additional research before our discovery calls and kickoff meetings with new clients. We have found it a great way to get summaries of broad research and up-to-date news on clients, their brands, and current competition. However, this is, at best supplementary information and does not replace all the branding and tone guidance we get directly from talking with our clients. It’s an ‘addon’ rather than a replacement.
Another use case could be to use a language ai to come up with suggestions around content for your videos. But this currently involves a lot of time spent coaching, guiding, and training to get to any truly original or specific creative content, as what usually comes out of ideation sessions is generic, somewhat cliched, and obvious.
Also, it’s hard to actively brainstorm with language ai, as there is a distinct wait between prompting it and getting a response. Our experience has shown that some of the best creative ideas come from our in-person discussions with our clients, where we can rapidly jump from idea to idea. So while it is possible, the current idea quality level is dry and limited, and there are no real-time savings we can get here. We do not use them for this purpose.
However, nothing stops you from using chat ai’s to generate some suggestions that you can bring to the table for the real creative in-person meeting.
Our conclusion here is that we are happy to use AI tools to help us get as much information as we can prior to meeting with clients, but it’s only real human interaction that produces the best ideations and creative solutions for our clients.
Step two – Scriptwriting
As scriptwriting is primarily text-based, you can probably guess correctly that it is possible to get a language AI to write a good script for you. What we’ve seen so far is pretty impressive, but still in its early days, so not yet brilliant or super original. As we know, language AIs get their training from data they are given, and there are a lot of written scripts available, so it’s no surprise that it can knock out a pretty passable script very quickly. However, as with the brainstorming and ideation sessions, what is currently produced is somewhat limited and quite often cliched.
We have been investigating several AI language tools, but the current leader is ChatGPT. As with all language models, you have to spend some time ‘training’ the conversation about the client, their ideas, the subject of the video you are creating, and several other factors, like the call to arms, messaging, target audience, tone, perspective, voice, and other factors. As of writing, there are two live versions of ChatGPT, 3.5 (which produces copy at the level of a highschooler and can have some factual errors, as it has a tendency to make things up) and version 4, which is far more developed and can even write in dialects and regional accents, writes at a much higher level and is far less likely to fabricate information. However, while version 4 is better all-around, it is currently limited with a question cap (25 questions, or prompts, every 3 hours).
Another downside of using a language model is that editing content is also somewhat limited, but it is improving. In our recent trials, we had it write a script and then rewrite it from various perspectives and points of view (changing the audience from users of a thing to peers in the industry to potential financial backers of the thing). It can do this, but only really by changing a sentence or two rather than rewriting the whole script each time. This shortcut approach is fine, but not really at the standard that we are looking for and that our experienced scriptwriters currently offer our clients.
But what do we know, right? In order to test this, we have started sending our clients two scripts, one written by one of our team of expert copywriters and the other by ChatGPT. The client is unaware that one is AI generated until the end, but our clients have picked the human-generated scripts in every single use case to date. This may change as we improve on how to prompt the ai, though.
So our conclusion right now is that while it is possible to use AI with scriptwriting, it’s not yet at the quality level we require for our work.
Step three – Voice Over and Music
With VO, we are technically looking at text-to-voice solutions. Something that takes the text from the script and can vocalize it.
Some interesting ai tools can be ‘trained’ with human voices to build a comprehensive dataset from their voice. We’ve looked at a few, but Descript, WellSaid, and ElevenLabs all stand out currently. Descript does a great job at text-to-voice. WellSaid and ElevenLabs have banks of usable and pretty solid voices you can use “out of the box”, and by pasting in or writing from scratch a script, you can produce a pretty believable voiceover. With all three, you can train them to use your voice (or a voice actor could too). The more you record, the more the ai voice sounds realistic. In fact, asking several of our current voice actors, we have found that a lot of them are actively using such tools to help produce a much quicker turnaround for pitches – they will create a model of their voice, drop in an example script and send off that file when being shortlisted for a job. This makes perfect sense and is a great example of people adapting and using AI to enhance their work lives. Overdubbing and rewording are pretty easy to do too. But a great voiceover is still down to the actors’ ability to change tone and intonation on-the-fly in response to a client’s request. We sometimes offer live readings to our clients, where they can listen and direct the actor so they get exactly what they need. VO ai is not quite at that level yet where you can comprehensively and rapidly change a recording, say from an English to a Scottish accent, or from a clear pronunciation to a thick regional dialect in one reading.
The VO tools also offer libraries of voices, which is great until everyone and their robot dog use the same voices. However, we want to differentiate our client’s work, which currently still means using a live real-life actor. We are happy to send a client an ai trained demo that’s a snippet of their script if that’s what the vo actor wishes.
With music, we are looking for background sound that helps carry the script’s story and can even emphasize key moments or the overall ‘feel’ of the brand. Historically this has been done by commissioned musicians (expensive) or from sound libraries (cheaper but can be a little generic), or rarely by licensing songs (very expensive). Again there are quite a few ai tools out on the web already. We’ve looked at Soundful, Boomy, and Riffusion, amongst others. While Boomy and Riffusion are a little too random for any real use in our industry, Soundful can rapidly produce original music along a wide variety of genres for use. The benefit of creating original background music is that it means our clients’ work can be unlike any other content. This is very much in the same camp as writing and voiceover, as while it is technically usable right now, we are still in the early stages of this model, which comes with some surprising results.
Recently there has been controversy in the music industry as a few tracks have been released as having been ‘sung’ by real artists or as their material, which is definitely a taboo and a huge legal issue, especially when the ai uses ‘found’ music as the raw source material for its creation.
Conclusion? Well, while we are fully interested in using AI-generated music and sounds one day, we are now watching the space to see what develops.
Step four – Design and Storyboarding
Our design stage compromises of three main designed items: a storyboard (a series of simple drawings that illustrate the script and show/suggest basic camera positions and movements), a Style Frame that offers at least a couple of different visual styles, fully colored, for the client to pick from and revise into how the whole animated piece will ultimately look and a Designboard, which is a document similar to the storyboard but shows one keyframe for each storyboard scene with the Style applied to it. The animation team then uses this final designed file as keyframes with assets which is very helpful in creating a sustained and consistent style throughout the animation.
So, taking each design item one by one, let’s see what the current AI landscape looks like.
Storyboarding does not have any ai tools that are working currently. There are, what I like to call, cyber stake claim sites, such as Storyboard Hero, which is simply a waitlist for something that appears to be seeking funding at this time but didn’t want to lose the internet URL. You can use tools like MidJourney, Stable Diffusion, Dall-E 2, and other image generation ai tools to create screens, but the amount of work training it and tweaking and training the prompts and downloading/uploading images and reference materials only to get somewhere nearby what you envision is not really a replacement or enhancement for having a professional illustrator or animator create one. Even if we did create one from a patchwork of separately generated images, we would still have the underlying issue of not having workable layers.
So, we cannot offer, use or create ai alone generated storyboards at this time. If you’d like to read more about this, Michal Bucholc, creative director at GONG, has written a great article on Medium Storyboarding with AI that breaks down the current issues in a lot more detail. Buchoic also wrote another more advanced piece about storyboarding that he recently added that updates a few things. But without quite a lot of human input and editing, it’s not yet at a usable or timesaving level yet.
To be totally transparent with you, I originally started writing this article over a month ago, then researched a lot of ai tools; every day, something new came to market or dropped off. The landscape is changing so rapidly that from the moment this piece goes live, it will already be outdated. With that in mind, when I started writing this section, Style Frames were possible, but each one was a crap shoot of randomness, whereas today, getting a consistent character or scenario across various different styles is possible. However, there is a big issue with the current state of ai image tools: they only generate a flat image. In order for us to be able to use it effectively, we would have to download the generated image, then cut, masque, edit, and layer each one to allow for even the simplest depth of field (for example, splitting the background from the foreground, or the focus point from the foreground and background layers). Remembering that the use case for ai that we are looking for is to make our lives easier or to automate the time-consuming parts of our process, this would actually be far slower and thus ultimately cost everyone more. Even after all the current advancements, the tools are still not really robust or stable enough to consistently produce good quality work for us.
Desingboards (where we marry the sketch of the storyboard to the color styles of style frames) suffer the woes of both of the above toolsets and ultimately, as of writing this article, we still prefer to use our illustrators to build each of these visual components of our deliverables, especially as they all feed the animation teams with what they need to help them get their job done.
Conclusion? Not yet really a viable use case here, but some really promising things are on the horizon.
Step five – Animation
The next, and our longest and most complicated stage, is animation, where we take the design boards, storyboards, and voice-over, combine, edit, and sync them all together, then mix in the music and sound effects to produce the final product. And as of writing this, there is no viable or usable AI solution for this. Not even really for a part of it. As we can see from earlier, we can generate parts of the process and information that feeds the final product, but no viable all-inclusive animation ai tool will give us what we need as of writing.
Some wonderful and very powerful tools will take live action and replace actors with animated characters, such as Wonder Dynamics’ Wonder Studio, which is yet to go live, but when it does, it looks great, but we’d need to film everything in the real world first, then create animation characters, then map then onto the live action. Gen-1 from Runway will take a short (15 seconds) video and overlay a different animation style, or you can train it to create a style from a set of images/illustrations/photos, etc. The end product is not really ideal, and for the precise work we do, it is too random to be of any real use. Runway Gen-2 looks fantastic and will apparently create text-to-video, so in theory, we’d be able to write animation prompts, but it’s not out yet.
You even have Imagen from Google, which looks to be great but is not yet live.
For a short period of time, a tool (ModelScope text to video synthesis) existed that would take a text prompt and generate a 2-second clip of video, but that is now gone, most likely due to the fact it was stealing copyrighted source images from online libraries like Shutterstock and Getty Images, then blurring the watermarks on them.
There are a whole bunch of tools, like Synthesia, Fliki, and D-ID, that offer a text-to-video ai, which all use library avatars, or you can upload your own one (so you could create one in something like MidJourney and upload that). But they all produce static, face-to-camera, presenter-style formats with no movement or other animation around. Think of a filter in TikTok, Snapchat, or Zoom.
Conclusion: nowhere near close, and probably going to be a long minute before any real changes to our workflow will happen at this stage.
If you’ve reached this far, thanks for hanging in there! This is a journey for sure, a rollercoaster of wonder, and it’s all a little scary too.
In 2018, Dell published a report that said 85% of the jobs in 2030 haven’t even been invented. I honestly feel that this will happen far sooner than 2030 at this rate. What will our working world look like next year or even the next month? There are going to be many changes and things to look forward to. The best way to handle these upcoming changes is to tackle them head-on, and here at Explainify, we intend to be early adopters, learners, ai wranglers, or whatever the new world needs for us to continue to produce the highest quality work we can for our clients.
If you have made it this far, thank you for reading! If you’d like to know more about what we do for businesses, have a wander around. If you’d like us to help you make great marketing, explainer, brand, PR, sales, app, launch, or any other kind of animated video, please reach out to talk to one of our Video Strategists.