VOID VLM-Mask-Reasoner — Quadmask Generation

Generate 4-level semantic masks (quadmasks) for interaction-aware video inpainting with VOID.

Pipeline: Click points on object → SAM2 segments it → Gemini VLM reasons about interactions → SAM3 segments affected objects → Quadmask generated

Use the generated quadmask with the VOID inpainting demo.

Quadmask format

Pixel Value Color Meaning
0 (black) Red overlay Primary object to remove
63 (dark grey) Yellow overlay Overlap of primary + affected zone
127 (mid grey) Green overlay Affected region (shadows, reflections, physics)
255 (white) Original Background — keep as-is