AI transcription - DIGITAL PRODUCTION

Digital Anarchy ends Transcriptive web

Bela Beier — Fri, 01 May 2026 08:26:33 +0000

For those who don’t know the tool: Transcriptive is/was is a text based editing panel for Adobe Premiere that transcribes media via web services, plus related panels, and it now leans on imports and alignment.

The date that matters, and what actually stops

Transcriptive and its related products reach end of life in May, 2026. The affected lineup includes Transcriptive, Transcriptive Rough Cutter, PowerSearch, the Transcriptive Web App, and EFF-IT!.

Digital Anarchy releases ShotNotes panel for Premiere.

The change lands on the web side. After the shutdown, you cannot generate new transcripts through the Transcriptive panel, and the Web App no longer remains available. An email notice sets the cutoff for new transcripts to May 4, 2026.

If you already have transcripts, the Adobe Premiere panel continues to work and existing projects should stay intact. The panel keeps letting you open and edit transcripts you already created, and it keeps working with transcripts you import from elsewhere.

What still works inside Premiere, for now

The parts that survive are the local bits. The panels continue to run as long as Adobe keeps supporting CEP panels, while Adobe moves toward UXP for extensibility. That matters because Transcriptive, PowerSearch, and EFF-IT! sit in the CEP panel world. Digital Anarchy expects that older panels should continue to work for at least a couple of years, likely longer, but the long term direction stays with UXP.

EFF It: A Plug-In for Profanity and Professionals

In other words, your old transcripts do not evaporate, and your sequences do not suddenly uncut themselves. They just stop getting fed by the online transcription pipeline once the web services go dark.

Alignment becomes the escape hatch

There is one workflow the shutdown does not take away: importing transcripts from other sources and using the free Alignment function in the panel. If you already live on SRT, VTT, or third-party speech-to-text, alignment becomes the bridge back to text-based editing in your existing Transcriptive projects.

That also turns PowerSearch into more of a local indexer story. PowerSearch keeps operating in Premiere as a panel, and it can still search what is already in your project. The end of life does not remove it from your install, it removes the cloud services Transcriptive used for generating new transcripts.

If you depend on any of this in a shared pipeline, treat the transition as a tool change, not a footnote. Test the post shutdown behavior on a real project copy before you commit a production show to it.

Subscriptions, renewals, and refunds

Renewals for Transcriptive subscriptions have already stopped. If any renewal charges show up, the vendor asks customers to reach out for a refund.

The end of life notice also says users can keep using existing transcription access until May 2026. Once prepaid minutes run out, the public notice describes a pay as you go option for transcription during the remaining period before shutdown, with prepaid transcription credits no longer available.

Final builds and where to grab them

A final build with minor fixes is available for download for macOS and Windows.

The email notice points to version 3.17 installers here:
https://digitalanarchy.com/downloads/transcriptive_317_Pr.dmg
https://digitalanarchy.com/downloads/transcriptive_317_Pr.zip

The public end of life post points to version 3.16 installers here:
https://digitalanarchy.com/downloads/transcriptive_316_Pr.dmg
https://digitalanarchy.com/downloads/transcriptive_316_Pr.zip

If you archive toolchains for long running shows, stash the installer you actually deploy and document the host app versions you validated, especially with panel frameworks shifting over time.

The rest of the catalog continues

Ugly Gets an Upgrade: Digital Anarchy’s Free Halloween Plugins

This change only targets Transcriptive and the related panels. All other plugins remain available!

https://digitalanarchy.com/blog/video-editing-plugins/transcriptive-end-of-life-web-services-will-be-ending-in-may-2026/

https://digitalanarchy.com/eff-it/

https://digitalanarchy.com/transcriptive/

The post Digital Anarchy ends Transcriptive web first appeared on DIGITAL PRODUCTION and was written by Bela Beier.

DaVinci Resolve Version 19 – Speech recognition

Uli Plank — Sat, 14 Sep 2024 08:00:00 +0000

Until the final version is released, we are concentrating here on improvements of functions that were already available in version 18 but were not yet particularly mature. One of these is the transcription from audio to text.

In our article on 18.5 and 18.6 of DaVinci Resolve (DR for short) in DP 23:06, we already dealt in detail with the then new speech recognition. Unfortunately, we had to admit that there were still considerable comprehension problems. As the competition, such as Adobe with Premiere Pro, is not sleeping in this area either, this time we want to find out whether anything has changed.

In any case, a new feature is the optional recognition of the person speaking, to whom you can subsequently assign a name. Initially, we left the setting in the programme on “Auto” in the hope that the AI would find out the language itself. Unfortunately, this was a disappointment, because in a feature film with clearly understandable German dialogue, the program only stated that a foreign language was present and reported the type of music and a few noises. Incidentally, this was done in English, even though we had switched the interface from DR to German. You would expect the AI to orientate itself to the language setting of the operating system, or at least that of the GUI.

With “Auto”, the language was not recognised and a bit of English was hallucinated.

Only after selecting the language to be recognised in the project settings did we get a more usable result. However, after closer inspection, we again found the hallucinated, multiple repetitions of a sentence that was basically recognised correctly. These also appeared elsewhere in the film without dialogue, often before the actually existing sentence. There were also occasional omissions, as in version 18, with even slightly longer sections of dialogue occasionally being ignored. During a passage of jazz music with a small cast, without any dialogue at all, we got completely nonsensical text mixed with Cyrillic letters (or should that be Greek?).

Subtitles

As we have had relatively good results in the past with the “Create Subtitles from Audio” function in the timeline menu, this was our next test. To our surprise, this is obviously not the same AI, as the language is recognised correctly here even when set to “Auto” and the results tend to be better than with transcription. Where there were misinterpretations, these were not identical to those from the transcription.
To ensure that it was not just the language, we repeated the test with a French film. The results were largely comparable in terms of error rate and type of error, and the differences in language identification between transcription and subtitles were also the same. What was astonishing, however: If the film was in French but the language for subtitles was set to German, the AI can translate subtitles reasonably well. Where the text was recognised correctly, punctuation and spelling were also surprisingly good, e.g. subordinate clauses or questions were rendered correctly.
As well as being accessed via the icon at the top, the transcription can be accessed by right-clicking on clips or a timeline in the Media Pool. It is irritating that after closing the text window, selecting “Transcribe” in the menu opens the window again, as if the analysis had to be repeated. A new feature here is that text can be imported from subtitles or exported with speaker information as a file in .srtx format. Logically, the errors are identical to those in the transcription window.

Recognised speakers can be exported with the subtitles.

Identify speakers

The menu also contains the new “Speaker recognition” function. After correct language selection, the AI needed less than 5 minutes for a feature film of just under one and a half hours on a MacBook M1 Pro with pure text recognition, with this additional function it was a good 7 minutes. Unfortunately, the results were not outstanding in this respect either. Even male and female voices were not always correctly separated in short dialogues. In longer text passages, there was still confusion between speakers of the same gender, even between young and older voices.

Sometimes the AI is confused – only the negated variant occurred once.

In addition, the AI generated far more speakers in the list than were actually involved, especially for short dialogue passages of the same person. On the other hand, the text of a completely new person is often assigned to a person who has already appeared before. The AI had difficulties recognising the same person, especially when changing the tone of voice to express different emotions, as well as when whispering and shouting with room reverberation, even if the text content was recognised correctly. Overall, the new function is still not very useful under these conditions.

The optional recognition of the respective person is new.

In all tests, we noticed that speech recognition seemed to run much faster at the beginning, with up to 28 times real time being reported. After a short time, the display dropped to considerably lower values, in some cases less than ten times. We therefore also tested the possibility that a short clip could be analysed better than a longer one – unfortunately to no avail, there were no differences.

There were actually two speakers in this confused dialogue.

Filtering with AI

We had deliberately presented the AI with fully mixed feature films as an endurance test, because even in earlier versions, a commentary in clearly spoken English without any music or environmental noises was recognised almost perfectly. A trial with conventional normalisation only brought us various “Muffled Speaking” messages in the subtitle function, but DR now offers two neural filters for speech called “Voice Isolation” and “Dialogue Leveler”. When set to 100%, the first can actually remove most of the music and background from speech, but should generally be reduced somewhat as it can distort the sound.

Normalising the volume was not very helpful.

The Dialogue Leveler is primarily suitable for improving the intelligibility of speakers when the volume varies greatly. We also tested it for transcription, although the AI often recognised e.g. whispered speech surprisingly well. On the first attempt, the speech recognition seemed to fail completely after activating both filters, but this is probably due to the beta and could not be reproduced. The second time, everything ran smoothly and it only took a little longer than the pure transcription.

The new filters for speech do an amazing job – but not much for transcription.

Loud music or noises come through in the Voice Isolator as incomprehensible snippets of speech, which are occasionally interpreted as such by the AI. Otherwise, these experiments hardly brought any improvements for the transcription, obviously the AI already does this quite well internally. The errors were only sometimes different, but not significantly less frequent. So you can save yourself the additional computing time in this respect. Regardless of this, both filters can be very useful, especially for documentary film under difficult conditions. They work quite fast even on modest hardware.

Comment

Blackmagic is obviously running its mouth with the following marketing statement: “Due to recent advances in AI and expert system technologies, it’s become possible to get remarkably accurate and perfectly timed subtitles of spoken text using DaVinci Resolve’s Create Subtitles from Audio function.” We were unable to verify this, at least for German and French. Even the position in time is not always accurate, and you still need to check the AI results carefully.
As the errors with identical source material are largely the same as those in version 18, we do not expect any outstanding improvements in the final version either. In addition, DaVinci Resolve only recognises a dozen or so languages, while Whisper understands around 100. However, you would have to install this yourself, as it is open source. The new speaker recognition is also not very useful as long as it is too often unable to correctly identify people in fast dialogues. One wonders whether BM could not improve this by correlating it with the visual person recognition that has been available for some time.

The post DaVinci Resolve Version 19 – Speech recognition first appeared on DIGITAL PRODUCTION and was written by Uli Plank.

Blackmagic DaVinci Resolve – the Sequel

Uli Plank — Sun, 29 Oct 2023 17:30:00 +0000

Let’s start with transcription: The value of a new feature is, of course, extremely dependent on your workflow and specific needs. But for documentaries, especially with interviews, AI-based text recognition (only in the studio version) is perhaps the most useful new feature in 18.5. It can even be helpful in feature films if the director likes to let the actors improvise.

The selection of languages is wider than that for the programme itself.

We tested speech recognition on an entire feature film in order to test it under critical conditions, such as music and strong background noises. Even on a modest MacBook M1 Pro, it ran surprisingly quickly at 4 minutes and 15 seconds for an hour and 42 minutes of film. But then the typical behaviour of today’s AI became apparent – the result was somewhere between Wow! and What? Sometimes we were extremely impressed when it still recognised the text correctly despite the film music, where we had difficulties ourselves and had to listen several times.

Sometimes the AI is also quite confused.

But there are also a lot of results that don’t (yet?) replace humans. In passages with long periods of silence or only quiet noises, it started to hallucinate. And not just in German, although we had switched from automatic to our language. Chinese or Korean text could still appear from time to time. Or absurd poetry such as “If you think the body Bugün? Cve is that giant?”

And who says AI isn’t creative? Word creations that probably don’t appear in the internal dictionary were, for example, “Kapitaneverbräucher” as an interesting alternative to “Kapitalverbrecher” or a nightmare that mutates into “Eiltraum”. Sometimes, however, the context is ignored in such a way that “photo album” can become “fodder album” when spoken quickly.

Sometimes the AI is also quite confused.

All of this may be rather amusing, but word turns that change the meaning, such as “What happened?” instead of “Did what happen?” can easily be overlooked. Strangely, sometimes even longer, clearly understandable passages are completely ignored. Conversely, the AI occasionally inserts a short text passage several times, which actually only occurs later in the film. In short: you can’t get by without careful checking and manual correction, although the function is better than in Premiere.

The transcription is linked to the viewer, with cut points for a marked sentence.

Editing assistant

Nevertheless, the AI can save a lot of time, especially when editing, because the recognised and, if necessary, corrected text remains linked to the image. If you have selected a text passage, of course also via text search, the corresponding point in the film is shown at the same time. A moving cursor appears in the text while you play the video. As corresponding cut points are also set temporarily, a section can be cut into the timeline immediately. This works in both the Cut and Edit pages. The corresponding functions can be found directly in the transcription text window, from playback buttons to insert or append to setting markers and creating subclips.

There are also text functions such as search and replace, changing the font size and optional display in black on white. Only major changes to the text length in this window can have unintended side effects when editing. Therefore, there is a copy function for the external use of text, but no paste function. The backspace key crosses out selected passages, but does not delete them. This results in the passage being omitted during editing. Of course, you can also export the entire text here if required. The text window can be freely configured and moved to a separate monitor.

German subtitles for English tutorials are created with the help of speech recognition and DeepL.

Subtitles

As the transcription is based on text and timecode, it makes sense to generate subtitles automatically in a similar way. The “Create Subtitles from Audio” function is available in the timeline menu for this purpose. This takes a little longer than the transcription, around three and a half minutes for half an hour of film. After that, the subtitle track contains clear texts if the speaker, in this case Cullen Kelly, was clearly audible. This time we tried a tutorial from Blackmagic (BM for short), in which the voiceover was completely clear and could even be heard without background music (it works!).

Now we wanted to create German subtitles for this video using colour management, as no German version exists in this case. In principle, this also works if you simply enter German instead of English in the dialogue box for the title generation, but unfortunately such an internal translation is not really convincing. We have therefore renamed the exported *.SRT file to *.TXT and had it translated by the respectable DeepL. Unfortunately, this is only possible with the full version, as otherwise the time codes and line breaks in the file are not retained (but there is a test period).

We only had to rename the result to *.SRT again and were then able to import it as a subtitle file. Thanks to better technical conditions, the text recognition in English was already more accurate, as was the translation by DeepL. Nevertheless, it is essential to review the foreign-language version and correct it if necessary. This applies all the more to the German translation, as even DeepL does not yet have a complete grasp of technical terms such as those used in this complex topic. What, please, is a flat log state? The appearance of a log file..

The translation by AI is not always helpful.

So here too: Trust (in AI) is good, control is better. Nevertheless, it can save a lot of time in both use cases, depending on the sound quality and level of sophistication of the language. You will want to revise the formatting of the subtitles for high-quality jobs, e.g. only a minimum time interval can be specified in the dialogue and different speakers are not recognised. Finally, line breaks and timing are not always perfect.

With the full version of DeepL, the subtitles remain in place.

Classification

Last but not least, another delicacy of the AI for audio: it can classify the sound of all clips according to criteria such as dialogue, effects, music or silence. It adds sub-categories to these categories if it has recognised something, such as sirens or dogs barking. These terms appear in the audio metadata under “Category” and “Subcategory” and can of course be corrected or added there. You can then sort the material into “Smart Bins” on this basis – making your work even easier.

Even the preview shows that the contours
are not very precise.

Relight

Let’s move on to the image, where the AI has also learnt something new. Relight is intended to simplify tasks that previously required complex Power Windows with tracking and therefore could not take into account the spatial situation in the image. Similar to Depth Map, Relight calculates the spatial constellation in the scene. On this basis, you can then place a directional light source, a point light or a spotlight and subsequently change the lighting.

This creates a halo on the background without an additional mask.

Once again, this is as amazing as it is difficult. Just as Depth Map cannot replace a green screen, the separation from the background is also problematic here. This can already be seen in the map preview: The AI has recognised the spatial situation correctly, but the mask is not clearly separated from the distant background. If we try to soften the lighting in the foreground, the result is unfortunately a “halo”.

Although Relight offers very versatile adjustments and even a connection to the tracker, this fundamental problem cannot be solved with this alone. You need precise masks or a key again. An additional ‘magic mask’ can help to a certain extent, but it is not always clearly delineated enough. On the other hand, it is clear that there is considerable potential with green screen if the background is replaced anyway and the lighting situation needs to be adjusted afterwards.

At dusk, the radiation from lamps can be enhanced quite credibly.

Another use case would be “Day for Night”, or rather twilight instead of night. If you can’t replace all the lamps with more powerful ones, they often can’t compete with the rest of the light. In this case, the function is very well suited to convincingly amplifying the light in their surroundings. The advantage here is that the angle to the light, especially in buildings, is correctly taken into account. Because you often need several light sources for this, but the effect is computationally complex, you can use an analysis for several nodes. Casey Faris shows this quite well in a YouTube tutorial.

To understand the function: Relight does not generate a light source itself, but only defines its area of influence in 3D space. The result is a mask with grey scales, which then controls the effect of all the usual grading settings. Incidentally, the surface map is compatible with the method often used by 3D software for the alignment of surfaces. Such image data, usually called “normal maps”, can be exported and imported for corresponding tasks. This opens up far-reaching possibilities for the integration of real video and CGI.

The surface map corresponds to a normal map, green is horizontal, red areas point to the right and blue to the left.

Blackmagic Cloud

Previously, only the project data was exchanged in a cloud account at BM. The transfer and synchronisation of proxies or even raw material had to be done via a Dropbox or Google Drive account. This was obviously not always easy, so BM now offers its own cloud storage (currently still in beta). This costs 15 US dollars a month for 500 GB of storage space. For comparison: Dropbox Plus costs 12 euros per month or 99 euros per year, so you can use 2 TB of storage space, but can only exchange up to 2 GB of data per day.

Blackmagic now also offers storage space in its own cloud.

Google’s cheapest offer initially costs 1.99 per month for 100 GB. This makes the BM Cloud seem quite expensive. But apart from the fact that you can cancel at any time, it is also more convenient. With the right settings, proxies are automatically generated and synchronised in the background. In addition, the desired storage space can be adjusted at any time, with precise billing after cancellation, even for parts of a month. The Project Libraries still cost an additional 5 US dollars.

It’s not cheap, but it’s convenient.

With a fast connection, DaVinci Remote Monitor can be used to stream a session in DR to another location. The image quality is so good that it is even possible to judge the image on a calibrated monitor.

Remote monitoring already works quite well.

All computers must be equipped with DR Studio and with Apple Silicon or for Windows and Linux with RTX Nvidia GPUs, other GPUs are currently not supported. You can use the free app of the same name on an iPad or iPhone. In addition, all participants must have a BM Cloud account. Of course, the connections are protected with a session code.

Improvements and additions

So much for the most spectacular functions in the studio version. The AI-based Depth Map previously found in DR is now also available in Fusion. Magic Mask has been given new fine adjustments, similar to conventional keys, with an additional parameter for “Consistency”. This allows you to better define uneven mask edges over the course of the clip. For this to work, a “stroke” must exist over a sufficient number of individual frames.

The “Magic Mask” can now be fine-tuned.

You will often use the cache for such work, especially on somewhat weaker computers. If any inexplicable errors occur, it usually helps to simply delete the cache and regenerate it. It doesn’t run smoothly yet, especially when combining AI-based tasks and tracking with masks. In addition, you should not work with reduced “Timeline Proxy Resolution”, the other options for smoother work with limited computing power work better.

Timelines can now be quickly saved as individual backups and you can set the colour management for each timeline separately. The cut page has been given many new functions, including subtitling including the speech recognition described above, simpler split edits, as well as cut detection and the optional creation of empty spaces in the main track. Anyone still struggling with interlace video will breathe a sigh of relief, as the cut can be correctly limited to the full frame limit.

Timelines including presets are sent directly to the render queue by right-clicking.

The Media Page now offers the export and import of timelines in OpenTimelineIO format. Timelines can be transferred directly to the render queue from here, with immediate selection of a preset and the storage location. When rendering individual clips, the complete originals can be output instead of the edited version. Power Bins can now also be exported or imported. DR also understands the latest XML format from Final Cut Pro.

Each timeline can have its own colour management.

Fusion now recognises the USD format (Universal Scene Description), which is becoming increasingly widespread and is also supported by the free Blender. There is also a specialised toolset including MaterialX Framework and support for USD Hydra renderers. Multi Merge simplifies the combination of several layers and a native Depth Map can be found in the studio version
the studio version. Clean Plates and Anaglyphs now have GPU support, and the splitter has also become much faster. Many will also be pleased with the 3D extrusion of shapes, with bevelled or rounded edges if desired. Corresponding shapes can also be designed with the sPolygon Node.

New formats

The latest SDKs have been integrated for BM and RED cameras, and XAVC H and HS from Sony now also work. Apple Log is supported, but unfortunately still no ProRes RAW. Playback of AC3 sound is finally available under Linux and Macs decode AAC with low latency. Compression to ProRes, AV1, H.264, H.265, MP3 and AAC is now offered for the MKV container, plus FFV1 in MKV or QuickTime. Customised render presets can be written as XML files and can be transferred to other systems to achieve identical results (provided the hardware encoders play along).

Speed

HDR material can be graded faster. On Apple Silicon, the flat noise reduction has become significantly faster, but the biggest speed gains this time are for the AI tools on the Nvidia GPUs with TensorRT. The first time the current version is started on a corresponding computer, an optimisation run is performed. As this is still a fairly new process, BM has cleverly provided a switch-off option this time. But even with newer AMD GPUs, such as the RX 7900, there are impressive speed gains for the AI tools.

Comment

With so many new features, users should be quite satisfied, shouldn’t they? There’s even more detail, including some long-cherished wishes. But unfortunately this time Blackmagic has managed to unleash an extremely unfinished version on the world. Presumably the IBC was the occasion to snatch a software from the developers’ hands that would have been more suitable as a public beta. Even the hastily delivered 18.6.1 still caused considerable problems. Only 18.6.2, which we tested, runs reasonably smoothly.
For the future, we would like the developers to concentrate a little more on bug fixes and a little less on new features.

Speed on PC hardware
We would like to provide you with a few more benchmark results from PCs for our test of Mac Studio from DP 23:05. Thankfully, these were created with the dedicated help of several members of the German DR forum. They also showed that 18.6 was mostly faster than 18.5.

A Ryzen 9 3950X with 128GB RAM and the RTX 4090 with 24GB VRAM was already way ahead with 18.5.1, with 6:11 at UHD and 15:57 (!) at 8K. The values for H.265 (Setting Master) and DNxHR HQX 10 Bit were almost identical. 

A Ryzen 7 5800x with 48GB RAM and an RTX4070 managed 13:25 in UHD in DNxHR, but needed significantly more at 21:05 due to the lack of a hardware encoder for H.265. this computer did not manage 8K. On the Intel side, a "Hackintosh" based on the 13900KF with 64 GB RAM and the Radeon RX 6900XT was quite fast in UHD with 11:40, 8K was not tested.

Under Windows, an Intel i9-10940X with 64 GB and the GeForce RTX 2080 Ti needed 22:08 for H.265 and 21:22 for DNxHR HQX.  An i9-9900K with 32 GB RAM and the GeForce RTX 3080 was significantly slower. Under DR 18.6, it took an hour and 6 minutes for H.265, and even a little longer for DNxHR. 8K was not tested.
An i7 9700K with 32 GB RAM and the inexpensive GeForce RTX 3060TI needed one hour and 34 minutes for UHD in DNxHR and two hours and 12 minutes for H.265 in 10-bit (8K was also out of the question here).  The detailed results and their authors can be found here: bit.ly/bmd_forum

Another useful benchmark for Neatvideo is here, including the results of common devices: bit.ly/neatbench

The post Blackmagic DaVinci Resolve – the Sequel first appeared on DIGITAL PRODUCTION and was written by Uli Plank.