What is a VTT File?
A VTT file (WebVTT) is a text file that stores timed subtitles or captions for a video, designed primarily for web players. It maps lines of text to timecodes so captions appear and disappear at the right moments.
A VTT file (short for WebVTT, Web Video Text Tracks) is a plain-text subtitle and caption file format used with HTML5 video and many web-based players. It contains timecoded cues that tell a player what text to show and when to show it. Because it is text-based, it is easy to edit, translate, version, and review.
Why it matters
VTT is a common choice when your content is watched in a browser: product training, customer support walkthroughs, internal SOP videos, and onboarding modules. Adding a VTT track can:
- Improve accessibility for viewers who are deaf or hard of hearing.
- Help comprehension in noisy environments or when audio is muted.
- Support localization by swapping caption tracks per language.
- Make content more searchable when paired with a transcript workflow.
For teams shipping process documentation and training, captions reduce repeat questions and help people follow steps precisely.
How it works
A VTT file is saved with the .vtt extension and typically starts with a header:
WEBVTT
After that, the file lists cues. Each cue has a start time, end time, and the text to display. Times use an hours:minutes:seconds.milliseconds style (for example 00:01:12.500). Many players also support optional cue identifiers.
In a web page, VTT is commonly attached using the HTML <track> element alongside a <video> tag. The player reads the timecodes and renders the text on top of the video (closed captions) when the viewer enables it.
VTT vs SRT
Both VTT and SRT store timed subtitles, but VTT is more web-native.
- Web compatibility: VTT is widely supported in HTML5 players.
- Formatting and metadata: VTT can support additional features like positioning and simple styling in some players.
- File structure: SRT uses numeric cue indexes; VTT uses the
WEBVTTheader and can omit indexes.
If you are publishing to a platform or tool that specifically requests SRT, export SRT. If you are adding tracks to web video, VTT is often the safest default.
Best practices
- Keep lines short: Aim for readable chunks (often 1 to 2 lines on screen).
- Use consistent timing: Avoid flashes; keep cues long enough to read.
- Match terminology to your SOPs: Use the same button names and step wording as your documentation.
- Validate after editing: A missing header or malformed timecode can break the whole track.
- Localize per language: Create one VTT per language and label tracks clearly.
With Vidocu, teams can generate subtitles from a screen recording, edit the text for accuracy, and translate captions for multilingual audiences before publishing training or help content.
Why it matters
Web-first caption format
VTT (WebVTT) is designed for web video players and HTML5 text tracks.
Timecoded cues
Each caption is tied to a start and end time so the player can show text at the right moment.
Good for accessibility and comprehension
Captions help viewers follow along without sound and support accessibility requirements.
Often compared with SRT
SRT and VTT are similar, but VTT is commonly preferred for browser-based playback and can support extra track features.
Examples
- •A support team adds an English and Spanish VTT track to an HTML5 help video so customers can toggle captions.
- •An L&D team exports VTT captions from a training recording and uploads them to a web player inside their LMS portal.
- •A product ops team edits a VTT file to fix button names after a UI change, without re-recording the video.
- •A global team generates translated VTT files for the same screencast and publishes one language track per region.
Frequently asked questions
VTT usually refers to WebVTT, which stands for Web Video Text Tracks.
Not exactly. A transcript is the full text of what was said, often without timing. A VTT file includes timecodes so text can be displayed in sync with video.
Yes. VTT is commonly used for closed captions, meaning the viewer can turn captions on or off in the player.
Many caption editors and converters can export VTT from an SRT. The main differences are adding the WEBVTT header and ensuring timestamps use the correct format.
A single VTT file typically represents one language track. For multilingual video, you attach multiple VTT files, one per language.
Common causes include a missing WEBVTT header, invalid timecodes, incorrect file path or CORS settings, or the player not loading the track as default/selected.
Related terms
Learn more
- AI Subtitles Generator — Create and edit subtitle tracks from a screen recording, then export for publishing.
- Video Translation — Translate captions into 65+ languages for localized training and support videos.
- Help Article Generator — Turn a walkthrough recording into a step-by-step help article with screenshots and clear instructions.
