bgless.video

Transparent video elements for apps, edits, and APIs.

Drop in video, image, or prompt. Get alpha-ready assets with job history and API access.

WebM alpha

Browser overlays

MOV alpha

Editing suites

REST API

Batch pipelines

Ready

Recent generation tasks

Large previews from the latest alpha jobs in this account.

Loading recent tasks

SAM 3 video background remover

SAM 3 Background Remover — Open-Vocabulary Video Matting

Drive Meta SAM 3 with natural-language concept prompts to remove backgrounds from any video. MatAnyone refines edges to alpha-grade output suitable for transparent WebM, ProRes 4444, and PNG sequence delivery.

Start with a free preview

Open-vocabulary text prompts ("the person", "the red car") — no point-and-click

Native video mask propagation across frames — no per-frame drift

MatAnyone alpha refinement for hair, fur, and transparent edges

WebM VP9-alpha, MOV ProRes 4444, PNG sequence, GIF, WebP outputs

RunPod Serverless GPU dispatch with adaptive polling

Why SAM 3 changes video background removal

SAM 3 (Segment Anything Model 3) is the first promptable concept segmentation model from Meta that natively tracks objects across video frames in a single forward pass. Unlike SAM 2 + GroundingDINO pipelines that require a separate detector and per-frame mask propagation, SAM 3 understands phrases like “every red baseball cap” and emits temporally consistent masks for every matching instance. Combined with MatAnyone for alpha refinement, the result is broadcast-quality transparent video without a green screen.

Open-vocabulary text prompts replace bounding boxes and clicks

Concept-level instance segmentation in a single model

Native video tracking eliminates flicker between frames

Released November 2025 under the SAM 3 license (research + commercial with restrictions)

How our SAM 3 background remover works

Upload a video, optionally type a concept prompt, and our pipeline orchestrates the rest: source video is streamed to a Cloudflare R2 staging bucket, dispatched to a RunPod Serverless GPU endpoint that runs SAM 3 video-predict mode, refined frame-by-frame with MatAnyone for temporally stable alpha edges, then composited and encoded with FFmpeg into the format you chose. Most 30-second 1080p clips finish in under 90 seconds on an RTX A5000.

Source upload → Cloudflare R2 (presigned PUT)

SAM 3 inference on RunPod Serverless (A5000 / L40S)

MatAnyone trimap-based alpha refinement

FFmpeg encode (WebM VP9-alpha or ProRes 4444 with audio passthrough)

Output URL served from cdn.bgless.video

Model tiers and pricing

Choose the SAM 3 variant that fits your speed and quality budget. All tiers include MatAnyone refinement, audio passthrough, and unlimited preview generation.

sam3-tiny — fastest preview path, 45 credits/min, ideal for thumbnails

sam3-base — balanced quality, 60 credits/min, default consumer tier

sam3-pro — SAM 3 Large + MatAnyone refinement, 180 credits/min, the production tier with text prompt support

sam3-human — human-class prior, 45 credits/min, optimized for face cam and portrait video

Prompt modes

SAM 3 accepts four prompt modalities. Pick whichever matches your UX.

auto — no prompt, segments the dominant foreground (simplest UX)

text — natural language concept ("the dog", "the surfboard")

box — normalized bounding box on the first frame (precise control)

point — positive/negative click hints on the first frame (corrects edge cases)

Output formats and delivery

Every output preserves the source frame rate, dimensions (subject to your max-dimension cap), and audio track. Transparent formats carry true alpha; MP4 falls back to color/image/video background composition.

WebM (VP9 + alpha) — browser-native transparent video

MOV (ProRes 4444) — Adobe Premiere / DaVinci Resolve workflows

PNG sequence (ZIP) — frame-perfect compositing in After Effects

GIF / WebP — social and lightweight web embeds

MP4 (H.264) — universal playback when alpha is not required

Questions before you create

Do I need a green screen to use the SAM 3 background remover?

No. SAM 3 detects and segments subjects directly from any footage. Green screen is supported but never required.

How does SAM 3 differ from SAM 2 or Grounded-SAM 2?

SAM 3 is a single model that handles both concept-level open-vocabulary detection AND video mask propagation. SAM 2 needs a separate detector (e.g. GroundingDINO) for text prompts and per-frame work for video. SAM 3 is faster, more consistent, and accepts richer prompts.

What does the text prompt support look like?

Type a noun phrase such as "the person", "the basketball", or "every red car". SAM 3 returns masks for every instance matching the concept. Negative phrases like "not the shadow" further constrain the result.

Can I get transparent video output?

Yes. Choose WebM (VP9 with alpha) for browser playback, MOV (ProRes 4444) for editing software, or a PNG sequence ZIP for frame-perfect compositing.

How long does processing take?

On an RTX A5000, a 30-second 1080p clip typically completes in 60–90 seconds end-to-end. Longer or 4K clips scale roughly linearly.

Is the API publicly available?

Yes. POST /v1/jobs with X-Api-Key. See the API page for the full SDK and webhook documentation.