Procedural Human-Guided Aesthetic Extensions
How AI agents extracted a live visual artist's identity from video, reproduced it in TouchDesigner, and generated novel extensions. A technical deep-dive into the Yousuke system built for AI Psychosis Summit NYC.
Abstract
The Research Question
System Overview
1,871
Frames Analyzed
sampled every 3s from 93-min set
40
K-Means Clusters
consolidated to 7 canonical techniques
7
Canonical Techniques
the actual visual grammar
43
GLSL Shaders
21 original + 21 mutations + 1 canon
2,705 lines
Shader Code
across 43 pixel shaders
3
Compositing Layers
additive blend, random selection
60 FPS
Performance
at 0.2% CPU on MacBook Pro
234
Tests Passing
across 7 test files
The Agent Stack
Video Analysis: OpenCV + scikit-learn k-means clustering on 19-float feature vectors extracted from 1,871 frames
Human Frame Curation: Operator screenshots fed as vision input to Claude Opus 4.7 via the Hermes harness
AI Effect Generation: Claude Opus 4.7 with 4-step validation pipeline (syntax, exports, test run, shape match)
Mutation Extension: Second-pass generation producing 21 controlled variants of the 21 originals
TouchDesigner Construction: Hermes Agent + twozero MCP bridge building 43 baseCOMPs programmatically via 36 native tools
Live Deployment: 3-layer additive compositing with frequency-band audio reactivity and beat-driven auto-rotation
Phase 1: Visual Identity Extraction
analyze_video.py samples the source video at 3-second intervals, producing 1,871 frames from the 93-minute set. Each frame gets downsampled to 64x64 pixels, then we extract a 19-float feature vector: 15 dominant color floats (k-means k=5 on the downsampled frame), edge density via Canny, mean brightness, mean saturation, and color variance. The feature matrix is normalized with StandardScaler and clustered with KMeans(k=40).
The 40 raw clusters consolidated into 7 distinct visual techniques. This is where the project's most valuable finding emerged: the canonical correction. I had hand-guessed 8 effects based on watching the set: Neon Contour, Particle Confetti, Voxel Explosion, Volumetric Rings, Shard Burst, Gold Particle Rain, Film Grain, and Kanji Float. K-means revealed that 7 of 8 were aesthetically wrong.
The actual visual grammar is chiaroscuro-bloom-chromatic with soft, indistinct light-boundary edges. What appeared to be "edge detection" in the source material was actually high-contrast luminance boundaries rendered through heavy bloom and chromatic aberration. The set's visual identity lives in the diffusion of light, not in its sharp delineation.
The k-means edge_density: high metric had been detecting luminance contrast boundaries between bloomed highlights and crushed blacks, not actual edge-detected contours. This distinction is subtle but critical.The 7 Canonical Techniques
~45%
Chiaroscuro Magenta Bloom
Crushed blacks, blown highlights, magenta/pink/white
~12%
Chiaroscuro Cyan Bloom
Same technique, cool palette (cyan/white/blue)
~15%
Crushed-Black Silhouette
Extreme black crush, figure barely emerges
~3%
Hazy Low-Contrast Dream
Raised blacks, dusty rose, uniform fog
~4%
Dark Atmospheric Macro
Shallow DoF, equipment close-ups, warm shadows
~3%
Pixel-Sort Radial Shards
Radial pixel extrusion, crystalline needles
~7%
Feedback Echo Tunnel
Recursive frame compositing, hall of mirrors
Phase 2: Human-in-the-Loop Frame Curation
The Collaboration Insight
Phase 3: AI Effect Generation
generate_effect.py uses Claude Opus 4.7 to write runnable effect code. It supports four generation modes:
From frame: A screenshot is sent as a base64-encoded image alongside the plugin contract spec. The model analyzes the frame's visual properties and generates a shader that reproduces the style.
From description: A text prompt describing the desired visual effect.
Extend: An existing effect's source code is sent for controlled variation.
From canonical: A catalog entry's visual signature and representative frame seed the generation.
Every generated effect passes through a 4-step validation pipeline before being saved to disk. On validation failure, the error message and rejected code are fed back to the model for up to 2 automatic retries. This self-correcting loop significantly improved the first-pass success rate. The output: 21 original Python effects, 2 canonical effects, and 42 GLSL shaders (21 original + 21 mutations).The Validation Pipeline
Every AI-generated effect must pass all 4 steps before being saved. On failure, the error and rejected code are fed back for up to 2 retries.
# 1. Syntax check: catch malformed Python
ast.parse(source_code)
# 2. Export check: required interface
assert "EFFECT_META" in module.__dict__
assert callable(module.fx_function)
# 3. Test run: functional verification
result = module.fx_function(
np.zeros((480, 640, 3), dtype=np.uint8),
MockAudioFeatures(),
{}
)
# 4. Shape match: output contract
assert result.shape == (480, 640, 3)
assert result.dtype == np.uint8Phase 4: The Mutation Strategy
Mutation Example: Confetti Particle Storm to Acid Confetti
Key differences: particle count 30 to 90, palette rainbow to cyan/lime/magenta, gravity direction down to up, audio driver bass to highs. Core technique (hash-based particles, starfield, body tint) preserved.
// PARENT: Confetti Particle Storm
// Pink body tint, 30 particles, bass-driven downward fall
vec3 body = mix(src.rgb, vec3(1.0, 0.4, 0.7), 0.4 * step(0.15, luma));
for (int i = 0; i < 30; i++) {
// ...
vec2 pos = vec2(
fract(seed + sin(t * 0.3) * 0.3),
fract(fi * 0.0371 - t * 0.15 * (0.5 + bass)) // downward
);
float size = 0.005 + bass * 0.008;
vec3 cc = 0.5 + 0.5 * cos(6.28 * (seed * 3.0 + vec3(0, 0.33, 0.67)));
}
// MUTATION: Acid Confetti
// Cyan/lime body tint, 90 particles, highs-driven upward float
vec3 body = mix(src.rgb, vec3(0.0, 1.0, 0.7), 0.5 * step(0.15, luma));
for (int i = 0; i < 90; i++) {
// ...
vec2 pos = vec2(
fract(seed + sin(t * 0.3) * 0.4),
fract(fi * 0.0111 + t * 0.2 * (0.5 + highs)) // upward
);
float size = 0.004 + highs * 0.01;
// Cyan/lime/magenta palette (discrete, not continuous)
if (sel < 0.33) cc = vec3(0.0, 1.0, 1.0);
else if (sel < 0.66) cc = vec3(0.5, 1.0, 0.0);
else cc = vec3(1.0, 0.0, 1.0);
}Phase 5: Autonomous TouchDesigner Construction
localhost:40404 that translates MCP tool calls into TouchDesigner Python API calls.
The build sequence executed five scripts in order:
1. td_build_effects.py creates 21 baseCOMPs, each containing an inTOP, glslTOP (with the full pixel shader source), and outTOP. 1,347 lines of GLSL across 21 shaders.
2. td_build_mutations.py uses the same architecture for 21 additional baseCOMPs with mutation shader variants. 1,358 lines of GLSL.
3. td_wire_all.py wires all 43 effect outputs to the 3-router compositing topology: effect_router, layer2_router, and layer3_router, each receiving all 43 inputs.
4. td_add_prominence.py inserts a levelTOP between the glslTOP and outTOP inside each baseCOMP, with opacity driven by frequency bands (bass for effects 0-13, mids for 14-28, highs for 29-42) and a 30% beat flash on detected beats.
5. td_update_rotation.py writes the auto-rotate chopexecuteDAT script with aggressive parameters: 1.5s switch interval, 5-beat threshold, random.sample(range(N), 3) for 3-effect selection.
Zero compile errors. 60 FPS. 0.2% CPU on a MacBook Pro. The production .toe file is 40KB.The MCP Bridge Protocol
The twozero bridge translates JSON-RPC calls into TouchDesigner Python API calls. All build scripts use this pattern with retry logic and 120-second timeouts.
MCP_URL = "http://localhost:40404/mcp"
def td_call(tool: str, args: dict, retries: int = 2) -> str:
"""Call a twozero MCP tool and return text content."""
payload = {
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {"name": tool, "arguments": args},
}
data = json.dumps(payload).encode("utf-8")
for attempt in range(retries + 1):
try:
req = urllib.request.Request(
MCP_URL, data=data,
headers={"Content-Type": "application/json"},
)
with urllib.request.urlopen(req, timeout=120) as resp:
result = json.loads(resp.read().decode("utf-8"))
content = result["result"]["content"]
return "\n".join(
c["text"] for c in content if c["type"] == "text"
)
except Exception:
if attempt == retries:
raise
time.sleep(1)3-Layer Compositing Architecture
random.sample(range(N), 3).Camera Input: videodevinTOP supports MacBook Pro Camera, OBS Virtual Camera, and iPhone via USB (Continuity Camera)
Effect Processing: 43 baseCOMPs each apply a unique GLSL pixel shader (inTOP to glslTOP to prominence levelTOP to outTOP)
3-Router Selection: effect_router + layer2_router + layer3_router each select from the full bank of 43 effects
Additive Blending: blend_add1 composites layers 1+2, blend_add2 adds layer 3 (compositeTOP, add mode)
Master Level: blend_level applies global brightness/contrast scaling
Output: main_output windowCOMP at 1280x720 to projector or secondary display
rmsFull spectrum (0-1)Overall energy level drives shader intensity scalingsub_bass0-80 Hz (0-1)Sub-bass rumble drives slow displacement and deep pulsebass80-300 Hz (0-1)Kick drums drive particle velocity, UV displacement, prominence (effects 0-13)mids300-3000 Hz (0-1)Vocals and synths drive color modulation, prominence (effects 14-28)highs3000 Hz+ (0-1)Hi-hats and cymbals drive fine detail, confetti density, prominence (effects 29-42)beatTrigger (0/1)Beat onset triggers 30% brightness flash and auto-rotate (every 5 beats)onsetTransient (0-1)Transient energy envelope gates effect switching (threshold 0.2)Performance Engineering
11.72ms
Slowest Python Effect
Kanji Float at 85 FPS effective, well within 33.3ms budget
60 FPS
GLSL Chain
0.2% CPU on MacBook Pro (GPU-accelerated)
42.5ms to 1.4ms
Shard Burst Optimization
-97% via vectorized rotation + single fillPoly
35ms to 7.8ms
Film Grain Optimization
-78% via half-res uniform RNG, no HSV conversion
234 passing
Test Suite
smoke, audio, plugin, render, perf, video, generation
1.5 seconds
Auto-Rotate Interval
with 5-beat threshold and 0.8s onset debounce
Lessons Learned
constantCHOP does not support item assignment. Writing ae['effect_idx'] = val throws a TypeError on every frame. The correct pattern is ae.par.value0 = val. The auto-rotate callback must use whileOn, not onValueChange. The latter stops firing when audio values go static during silence, which stalls the entire rotation. The keyboardinDAT requires focusselect='anywhere' or it only responds when its viewer panel has focus. A switchTOP with zero inputs defaults to 128x128 resolution regardless of custom parameters. And when inserting a levelTOP between existing operators, you must explicitly disconnect the downstream input first. Our initial td_add_prominence.py script had a wiring bug that left all 43 output nodes disconnected.
Process Insights
The canonical correction was the single most valuable finding. Without algorithmic analysis, the system would have shipped with a visual identity that looked nothing like the source material. Human intuition alone produced effects that were visually interesting but aesthetically wrong.
Human curation remained indispensable despite the power of algorithmic clustering. The best effects were generated from human-selected frames. The statistical analysis told us what was common in the source. The human selection told us what was good.
Mutation was more valuable than generation. The 21 mutation effects expanded the visual vocabulary more efficiently than generating 21 completely new effects would have. Mutations inherit the parent's core technique while varying specific parameters, producing a family of related looks rather than a collection of unrelated ones.
The MCP bridge approach scales. Building 43 effects programmatically through the bridge was substantially faster and more reliable than manual TouchDesigner interaction would have been. The bridge also enabled rapid iteration: when a shader didn't look right, the build script could be re-run with modified source code in seconds.Open Source
Built by Mauricio Trujillo Ramirez aka Bunny / Tektonic Company.
Powered by Claude Opus 4.7 (Anthropic) for AI effect generation.
Powered by Hermes Agent (Nous Research) for autonomous TouchDesigner construction.
MCP bridge by twozero (404.zero + setupdesign).
Special thanks to SHL0MS, Nous Research, 404.zero, and setupdesign.

