Skip to main content
Relevance AI supports three different models for converting audio to text, and two for converting video to text. You can use one of these Tool steps which use different models:
  • Deepgram (audio and video)
  • AssemblyAI (audio and video)
  • OpenAI (only audio)
By using these steps, you can convert your audio and video into readable text for other Tools to use.

Add a ‘Convert audio/video to text’ Tool step to your Tool

You can add the ‘Convert audio/video to text’ Tool step to your Tool by:
  1. Creating a new Tool, then searching for one of the ‘Convert audio/video to text’ Tool step
  2. Click ‘Expand’ to see the full Tool step
  3. Upload the file you want to convert
  4. Click ‘Run step’ to see the Tool step’s output!

Deepgram steps

If you use the Deepgram Tool step, you can select an option to recognise speaker changes.

OpenAI steps

Using this tool step requires an OpenAI API Key.
If you use the OpenAI Tool step, you can select a Response Type. There are two options, Transcript only and Transcript + advanced metadata.

Common errors

An error similar to the one noted below indicates that the provided input file is not of the supported formats.
"err_code":"Unsupported Media Type","err_msg":"..
Use an audio/video editor to export your file to common audio/video formats and try again.
This error only occurs for too large files (60+ MB). If you are working with large video files, use video editing tools such as Quicktime to extract the audio.
Request body larger than maxBodyLength limit
I