Kay’s script helper with AI!

AI is the big hype in every industry – at least if you believe breathless marketers who write newsletters to each other that nobody reads. But what if a VFX supervisor took on the topic? What does it do approximately? And now let’s find out ….
Ai-VFX-Breakdown Workflow

That’s exactly what happened at FMX – Kay Delventhal built an “Ai VFX Breakdown” generator as a test and reported on the results – what works, what doesn’t, and where is the edge in between. Kay Delventhal is a VFX supervisor with many years of experience – he was involved in Cloud Atlas, Sisi, Ku’damm 59 and many other productions. He is also active in the Visual Effects Society, has taught at the h_da, the Filmakademie and at The Animation Workshop, and is a member of the German Filmakademie – more information and contact at delventhal.com And he gave the FMX lecture on the subject of supervisors’ AI helpers, and we asked him what exactly it’s all about and what came out of the test.

Kay Delventhal
Kay Delventhal


DP: Hello Kay! As of today, what does your system do? And how did you get started with the test?
Kay Delventhal: As a visual effects (VFX) supervisor, I create a lot of VFX breakdowns. In some productions, there are a lot of script updates that have to be searched for VFX again and again and compared with the VFX breakdown. This is almost always a manual process that is tedious and error-prone.
To make this work easier, I have already tried to automate parts of it in the past, unfortunately only with limited success using “normal programming”. The “Ai VFX Breakdown” is a proof-of-concept and works as a prototype to the extent that I can use it for my own projects. It consists of various modules, with two parts standing out.
One is the part that uses Artificial Intelligence (AI aka AI aka ML) to find the VFX in the scripts and the second is the data pipeline that allows me to create VFX breakdowns faster.
In the summer of 2023, I had the idea that Generative Pre-trained Transformers (GPT) basically have a semantic understanding of text, which means that these systems can understand context. They can use embeddings to link basic, inbuilt knowledge to an answer.
And I like the idea that if GPTs have a basic understanding of VFX, this could be used to ask questions, including specific questions, about VFX. The first tests with OpenAi ChatGPT showed that there is knowledge about Visual Effects (VFX) in ChatGPT. But ChatGPT 3.5 struggled to find VFX in the text and to return the data in a readable form.
When GPT4 was introduced in the summer of 2023, the results were much better and I wanted to test finding VFX in film scripts in practice.


DP: How did you come up with the idea of doing so much groundwork?
Kay Delventhal: I studied in England in 1992 and took an “Artificial Intelligence” (AI) seminar there. I’ve been interested in AI ever since. In January 2024, a series project was unfortunately cancelled at the last minute and I suddenly had (too) much time on my hands and immediately decided to start developing the prototype for “Ai VFX Breakdown”.
I was motivated by two factors. Firstly, the aforementioned interest in AI; like many others, I have been amazed at the rapid developments in the field of AI in recent years and have been testing and trying out different tools time and time again. On the other hand, I wanted to better understand myself what these AI systems can actually do and I wanted to use these tools myself to get an idea of what is possible with GPTs, for example, and where these systems reach their limits.
I had previously carried out tests with RAG technology (Retrieval Augmented Generation) and other models, but I wasn’t satisfied with the results and didn’t want to invest my time there any further.
That’s why I decided in favour of OpenAi and GPT4 because it was the best model at the time, the tests were also promising and the OpenAi API is relatively easy to use.

DP: So how much effort do you have to put into KayKI to “analyse” a script for VFX shots and add notes, comments and an approximate cost estimate?
Kay Delventhal: Of course I still read through all the scripts and make notes on what takes time. But thanks to parallelisation, the “Ai VFX Breakdown” can work through several series scripts in just a few minutes. However, the result is raw data that still needs to be reworked. After I had parallelised the processes, the processing speed was amazing and, without the pre-processing of the scripts, is just a few minutes.
The tool currently has two major weaknesses. One is the so-called “task definition”, which describes what the actual VFX task is. And this “task definition” is currently imprecise and often has to be reworked. Secondly, the tool is oversensitive and always finds significantly more VFX than I would mark in the script myself. This depends on the project and the prompt priming and can be between 150% and 500% of the relevant VFX quota.
However, the second part of the data pipeline allows me to compensate for these two weaknesses through effectiveness, so that I am still faster overall than I was manually before.
Another advantage is that I work data-based and this allows me to analyse script updates at scene level and I only have to rework where there are actually VFX-relevant changes.
The following graphic shows that if you run the “Ai VFX Breakdown” over the same scenes more often, the accuracy with which VFX shots are found increases. If three out of four runs show the same number of hits, this is a good result and is marked with a green “1”.
Next, you can directly compare the VFX shots found from different runs. So ask: “Does the system not only find the same number, but are the VFX shots tasks also identical?”
If the four runs are compared via text similarity, 100 % similarity is the best possible result. Good results are marked with a green “100 %”.
Here we can see that the “Ai VFX-Breakdown” is basically quite stable in identifying VFX in film script texts. But you can also see the above-mentioned weakness in the “Task Definition”. This is an area where the system still needs to be improved.

DP: Can we just call it the KayKI?
Kay Delventhal: In my opinion, the name “Ai VFX Breakdown” fits quite well, as I don’t do any training or model development myself. As an artist, I use existing tools to create new techniques and new workflows.

VFX-Shot Detection Consistency per Scene, as „VFX-Shot-Count per Scene“
VFX-Shot Detection Consistency per Scene, as “VFX-Shot-Count per Scene”


DP: How does Chat analyse GPT scripts?
Kay Delventhal: The “Ai VFX Breakdown” currently consists of various modules. The scripts have to go through a pre-process where scene data is extracted and saved. In general, it seems favourable to keep the texts that are processed with GPTs shorter. However, there is basically no length limit for scripts or the number of episodes per season.
The system then generates a JSON file for each (possible) VFX, with a shot count field as an indicator that longer passages often result in more than one VFX shot.
The process itself is based on so-called “prompting”. Prompting in Generative Pre-trained Transformers (GPT) refers to the input of text or queries that guide the model to generate relevant and coherent answers or content.
A “prompt” can be a question, the beginning of a sentence or a specific instruction that causes the GPT model to respond to the request made. Targeted and precise prompting can improve the quality and relevance of the generated content (answers), as the model takes into account specific requirements and the context of the enquiry. A large part of the work for the “Ai VFX-Breakdown” has been the prompt development and prompt optimisation.

VFX-Shot Similarity Detection per Shot, as „VFX-Shot-Similarity per Shot“
VFX-Shot Similarity Detection per Shot, as “VFX-Shot-Similarity per Shot”


DP: “Hallucinations” are a well-known and feared problem? Do they throw a spanner in the works when it’s a relevant step in film production?
Kay Delventhal: There are problems in certain areas that could be explained by the phenomenon of “hallucinations”. For example, the fact that the system tends to find more VFX than a human – it tends to be “highly sensitive”. And the fact that there is a certain variance in the analysis is probably also partly related to the phenomenon of “hallucinations”.
Attempts to create character and asset lists with the system also show that the lists are more often funny than useful. The “Task Definition” mentioned above is also always entertaining. This area definitely still needs work.
The phenomenon of “hallucinations” can be partially contained by good prompting. The clearer the tasks are defined, the easier it is for the system to fulfil the tasks within the scope of its capabilities. The phenomenon of “high sensitivity” can only be controlled to a very limited extent and some of the results that are more strongly influenced by these effects still have to be reworked manually.
In other areas, however, the system is amazingly reliable and generates astonishingly good and consistent data. Especially when you consider that the data definitions are not hard-coded, but are generated by prompts, i.e. by pure question texts. This is truly amazing.

Ai-VFX-Breakdown Workflow
Ai-VFX-Breakdown Workflow


DP: What “creative limits” have you built into KayKI?
Kay Delventhal: At the moment, the limits of the system are determined by the prompt design. A module that subsequently checks the generated data is planned and will be integrated.
Unfortunately, there are currently very few options for influencing the text generated by ChatGPT. In principle, GPTs are simple: they generate an output text for any input text. And the input text doesn’t even have to be a question.
With GPT4, there are basically very few parameters that can be influenced. For example, there is the length of the response in tokens, the so-called token limit. I haven’t really had any use for this yet, but you can use it to limit the length of the output text.
And there is the so-called “temperature”. A higher “temperature” should make the output text more “creative” and a lower “temperature” leads to less “creative” output texts. It is a parameter that can be used to influence the “text randomness”.
The “Ai VFX-Breakdown” works better at a low “Temperature”, which can be recognised primarily by the fact that the script texts are copied unchanged. A high “Temperature” led to more errors in the JSON files.
There are other parameters such as “Frequency Penalty” and “Nucleus Sampling” or “Presence Penalty”, which relate to the frequency of individual words in the response, for example. And which I have not yet been able to analyse in more detail due to the time required for test series. The repetition of words has not been a real problem so far.


DP: Can you show us a prompt that you use for KayKI?
Kay Delventhal: The development of the prompts took a lot of time and in the end the prompts were optimised for the projects. The system then generates better results.
Prompts can become very complex and often contain not only instructions, but also single or few-shot examples and priming. Single or few-shot examples are used to define the output format of the JSON data.
Priming helps to create contexts and enriches the system with additional knowledge about VFX and the current project.

Example of a prompt:

Role:
You are an AI Visual Effects (VFX) Supervisor, fluent in English and German, and analyze scripts for VFX in scenes.

Task:
To do this, you read thoroughly through the scene text and spot potential VFX.

Format:
Category for “TYPE”:
RETOUCHE: Removal of modern elements.
FLYING_OBJECT: Objects flying via VFX.
CAR_CRASH: Simulated vehicle accidents.
VISTAS: Digitally enhanced landscapes.
MOOD_LIGHTING: Artificial atmospheric lighting.
ESTABLISHING_SHOT: Context-setting shots with VFX.
SET_EXTENSION: Expanded or altered sets.

Output:
{
“CODE”: “004_0010”,
“TYPE”: “COMPOSITING”,
“TEXT”: “A full moon shines in through a window.”,
“TASK”: “Create a digital moon.”,
“COUNT”: 1
}


DP: Phew, that’s quite a few requests! How many iterations did you need for this prompt?
Kay Delventhal: In fact, I had to test many prompts very often and usually over a large data set, so often over entire scripts.
This led to so-called “token limit warnings”, whereby OpenAi then cancels the processing and informs you in response that you have reached the current token limit for your account and that you cannot continue working for the time being.
Token limits are set in tiers for the normal Pro account. You can then “work your way up” there and increase the token limit. See: https://platform.openai.com/docs/guides/rate-limits
The token limit is also a protection, as it prevents high costs from being generated by mistake. You pay OpenAi via API by token and if you process a lot of data in parallel, high costs can also arise. See: https://openai.com/api/pricing/


DP: How bad is the headache when you hear the self-proclaimed gurus shouting “Yo! Prompt, dude! Yo!”?
Kay Delventhal: Since AI (aka KI aka ML) is the “hot shit” right now, everyone is jumping on the topic on all social networks and on the ÖRR. And they’re commenting and reporting as much as they can.
What strikes me is that reporting in Germany often tends to be negative and surprisingly uninformed. So-called risks are pointed out, and not everyone even realises what AI can be used for. Is this “German AI fear”?
Other problems, such as the energy hunger of these systems or the fact that there are only a few companies in Germany that develop AI, GPTs or stable diffusion at an international level, are rarely addressed, even though these problems are very real.
The development is rapid and there are always leaps in development and one should not forget that many of these systems did not exist 5 years ago.
My personal opinion is that AI is here to stay. And it’s about exploring the possibilities. That’s why everyone should use AI, no matter what it’s for, to form their own judgement, to see and understand the opportunities and then to be able to have a qualified say when it comes to problems and risks, but also benefits and opportunities.

GPT Prompting for Ai-VFX-Breakdown Workflow
GPT Prompting for Ai-VFX-Breakdown Workflow

DP: So KayKI generates additional data from the text – what do you do with the additional information that is generated?
Kay Delventhal: Some of the additional information generated or analysed by the system is used when the data is converted into Excel or Google Sheet format.
What is fed in there can be configured. And the aim would be to collect even more data and feed it into a database in order to further optimise the processes.
A long-term goal would be to be able to create a shot cost optimisation. With more data, this is absolutely conceivable.

DP: So – if everything works – you can practically jump directly from the script to the storyboard?
Kay Delventhal: If we assume that we have successfully post-processed data generated with the “Ai VFX-Breakdown”, then we could use Stable Diffusion, for example via API ComfyUi or Dalle-3 for image generation via prompt.
I wouldn’t call these images storyboards, as they are only generated without any real creative input. But even today it would be possible to feed in location photos, and there could be a new type of image that is generated automatically, which could be called Viz-Ref.
Camera operators and directors already often use Midjourney to generate reference images that they can’t find on the internet.
This image generation can be taken further, for example by specifying viewing angles. This could then be taken further and further and refined. At the moment, the system can automatically generate images based on the VFX shot data combined with scene data. And a controllability can be recognised in these images. However, the images are not concept art or storyboards, they are an automated visualisation.

Automatisierte Bildgenerierung mit DALL-E 3
Automated image generation with DALL-E 3


DP: Thinking further: could I also create a “Previz” timeline instead of Jsons?
Kay Delventhal: A “Previz” timeline would not be feasible as the system is currently set up. There are companies that are working on scheduling solutions, such as RivetAI. They are developing an AI-supported platform for planning and budgeting productions. See: https://www.rivetai.com/
But “Ai VFX-Breakdown” cannot currently implement such planning. What would be conceivable is to write data directly into a database instead of saving it as JSON. If you then save timecode (TC), metadata and handle data, you could also generate additional data, such as turnover or line-up sheets.


DP: And taking this a step further – could I create an output for bidding in KayKI that calculates an “approximate house number” using rough preliminary information?

Kay Delventhal: A shot-cost estimation is absolutely conceivable. I have already thought about implementing it, but it also needs corresponding reference or base data. And anyone who has ever tried to obtain good training data or generate it themselves knows that it’s not easy. This is definitely already on the to-do list.
Apart from good training data, it’s not easy to compare VFX projects with each other. I’ve been creating “guesstimations” for a while now, often based on shot count, to give clients an idea of the approximate cost of the project.
But with more training data, you should definitely be able to determine an “approximate house number”. Here too, you could easily optimise the figures with a simple, manual “difficulty factor”.

DP: And if we take the idea a step further: Could I also generate files, projects and scripts for the pipeline in the same way?
Kay Delventhal: As a client-side VFX supervisor, that’s not so interesting for me. However, I can customise my internal VFX numbers and the output data so that Excel files or Google Sheets are generated that can be easily read by vendors.
This then allows vendors to easily or even fully automatically generate files, projects and scripts for their pipeline. This has already proved its worth in day-to-day work.

DP: Can you imagine further “outputs” in standard formats?
Kay Delventhal: Basically, the “Ai VFX Breakdown” and the integrated data pipeline have opened my eyes to standardisation, including for VFX breakdowns. Automation and standardisation allow for easier integration and open up new possibilities in other software environments and data pipelines.
As in other areas of industrialisation, DIN standards or the Alembic cache or USD architecture have allowed companies to work together better and therefore faster and often more efficiently. This also applies in principle to the data side of VFX processing.
For some trades, there are already solutions that try to analyse film scripts for pre-production and shooting and make a plan. I already mentioned RivetAi, another tool would be Scriptation. See: scriptation.com/features/tagging/ or also Studiobinder, see:
is.gd/studiobinder_tags
For film scripts, there is Fountain, which attempts to make scripts easier to read (by machines) using a Markdown-like format. See: https://fountain.io/

DP: Is the “software test” complete for you, or are you still expanding your toolbox?
Kay Delventhal: For new productions, the “Ai VFX Breakdown” is used whenever possible. And no, the “test” has not yet been completed, but is in full swing.
The tool or prototype basically works, but the application reveals weaknesses and features that need to be improved. And there is also quite a long wish list of features that should be integrated.
In addition to my bread-and-butter job as a VFX supervisor, I often don’t have the time to drive the project forward.

DP: To put it bluntly, would a standardised language for film scripts – now that writers are better paid – be a great thing?
Kay Delventhal: A kind of “labelling language” (tagging), I don’t think so. Authors can already tag content today if they use FinalDraft. But this is rarely done. See: https://blog.finaldraft.com/tech-tips-tagging-for-writers
Film scripts are highly standardised and have proven themselves well as they are. Certain formalities, especially when it comes to scene headers, are of course helpful. And a Markdown format, such as that used by Fountain, would make it easier to read the text.
Such a syntax would further standardise film scripts and simplify automated processing.
An awareness of these issues would also be desirable for authors, as they also want their scripts to be realised as films or series.

DP: If we look at your tool in 10 years’ time, will it be a “built-in function” in the operating system?
Kay Delventhal: That would be conceivable. However, VFX break-down is a niche business and the full automation that is often mentioned and demanded misses the actual goal.
Really good tools simplify work and increase productivity. As far as that is concerned, I do think that AI will significantly change our work, including in the creative sector, through automation and increased efficiency.
We shouldn’t forget that the knowledge built into AI systems, the embeddings, will allow many people to access this knowledge, not just as text, but also as images or videos, and thus break new ground and create something new.