AI AgentsComputer VisionTouchDesignerLive VisualsCreative AIGLSLAudio-ReactiveMCP

Procedural Human-Guided Aesthetic Extensions

How AI agents extracted a live visual artist's identity from video, reproduced it in TouchDesigner, and generated novel extensions. A technical deep-dive into the Yousuke system built for AI Psychosis Summit NYC.

Mauricio Trujillo Ramirez / Tektonic CompanyNous Research / Hermes Agent & TD Skill·2026-04-29·24 min read

Abstract

We used AI agents and human curation to extract a live visual identity from 93 minutes of YOUSUKE YUKIMATSU's Boiler Room Tokyo x Super Dommune set, reproduced it as 43 real-time GLSL shaders in TouchDesigner, and extended it into novel visual territory. All of it deployed live at AI Psychosis Summit NYC on April 30, 2026. The system was built with Claude Opus 4.7 (Anthropic) for vision-based effect generation and Nous Research's Hermes Agent for autonomous TouchDesigner construction via the twozero MCP bridge. The result: 2,705 lines of shader code, 3-layer additive compositing, frequency-band audio reactivity, and 60 FPS at 0.2% CPU. The complete system is open source.

The Research Question

Can AI agents extract a human artist's live visual identity from video, reproduce it in TouchDesigner, and generate novel extensions of that identity? The subject was YOUSUKE YUKIMATSU's Boiler Room Tokyo x Super Dommune set, a 93-minute live performance whose visuals (by Bridge) define a specific aesthetic language built on chiaroscuro bloom, chromatic aberration, and crushed-black silhouettes. The goal was not to build generic audio-reactive visuals. It was to capture this particular artist's visual identity and extend it into new territory. The venue was AI Psychosis Summit NYC, April 30, 2026. The system needed to run live, in real time, processing audio input and driving visual output on a projector. Here's why this matters: generative art systems typically produce generic aesthetics. Procedural noise, reaction-diffusion, particle systems. This project asked whether AI agents could produce generative art that is derivative of a specific human aesthetic, not a generic one. The answer, documented here, is yes, with significant caveats about the indispensable role of human curation in the loop.

System Overview

1,871

Frames Analyzed

sampled every 3s from 93-min set

K-Means Clusters

consolidated to 7 canonical techniques

Canonical Techniques

the actual visual grammar

GLSL Shaders

21 original + 21 mutations + 1 canon

2,705 lines

Shader Code

across 43 pixel shaders

Compositing Layers

additive blend, random selection

60 FPS

Performance

at 0.2% CPU on MacBook Pro

234

Tests Passing

across 7 test files

The Agent Stack

We built Yousuke through a 6-stage pipeline spanning computer vision, human curation, AI code generation, and autonomous TouchDesigner construction. Each stage feeds the next, with human oversight at every critical decision point.

Video Analysis: OpenCV + scikit-learn k-means clustering on 19-float feature vectors extracted from 1,871 frames

Human Frame Curation: Operator screenshots fed as vision input to Claude Opus 4.7 via the Hermes harness

AI Effect Generation: Claude Opus 4.7 with 4-step validation pipeline (syntax, exports, test run, shape match)

Mutation Extension: Second-pass generation producing 21 controlled variants of the 21 originals

TouchDesigner Construction: Hermes Agent + twozero MCP bridge building 43 baseCOMPs programmatically via 36 native tools

Live Deployment: 3-layer additive compositing with frequency-band audio reactivity and beat-driven auto-rotation

Phase 1: Visual Identity Extraction

The first stage was objective analysis. analyze_video.py samples the source video at 3-second intervals, producing 1,871 frames from the 93-minute set. Each frame gets downsampled to 64x64 pixels, then we extract a 19-float feature vector: 15 dominant color floats (k-means k=5 on the downsampled frame), edge density via Canny, mean brightness, mean saturation, and color variance. The feature matrix is normalized with StandardScaler and clustered with KMeans(k=40). The 40 raw clusters consolidated into 7 distinct visual techniques. This is where the project's most valuable finding emerged: the canonical correction. I had hand-guessed 8 effects based on watching the set: Neon Contour, Particle Confetti, Voxel Explosion, Volumetric Rings, Shard Burst, Gold Particle Rain, Film Grain, and Kanji Float. K-means revealed that 7 of 8 were aesthetically wrong. The actual visual grammar is chiaroscuro-bloom-chromatic with soft, indistinct light-boundary edges. What appeared to be "edge detection" in the source material was actually high-contrast luminance boundaries rendered through heavy bloom and chromatic aberration. The set's visual identity lives in the diffusion of light, not in its sharp delineation. The k-means edge_density: high metric had been detecting luminance contrast boundaries between bloomed highlights and crushed blacks, not actual edge-detected contours. This distinction is subtle but critical.

Tektonicx402scan

Edge detectionsharp TRON-cyberpunk contoursluminance boundaries through bloom

Confetti particlessilhouette edge spawningdoes not appear in the set

Gold raingolden downward cascadegold only appears as LED reflections

Concentric ringsdrawn expanding halosfeedback echo artifacts, not drawn shapes

Voronoi fractureshard burst patternspixel-sort radial extrusions

Kanji overlaysdrifting CJK glyphsno CJK text, only the broadcast watermark

Voxel gridspixelated block displacementno pixelated block look anywhere

The 7 Canonical Techniques

~45%

Chiaroscuro Magenta Bloom

Crushed blacks, blown highlights, magenta/pink/white

~12%

Chiaroscuro Cyan Bloom

Same technique, cool palette (cyan/white/blue)

~15%

Crushed-Black Silhouette

Extreme black crush, figure barely emerges

~3%

Hazy Low-Contrast Dream

Raised blacks, dusty rose, uniform fog

~4%

Dark Atmospheric Macro

Shallow DoF, equipment close-ups, warm shadows

~3%

Pixel-Sort Radial Shards

Radial pixel extrusion, crystalline needles

~7%

Feedback Echo Tunnel

Recursive frame compositing, hall of mirrors

Phase 2: Human-in-the-Loop Frame Curation

Algorithmic clustering gave us statistical accuracy, but it missed aesthetic intent. I took screenshots of specific frames from the set that captured the feeling of the visual identity. The mood, the atmosphere, the emotional weight. These are qualities that a 19-float feature vector simply cannot encode. I fed those screenshots directly to Claude Opus 4.7 via the Hermes harness as vision input. The AI analyzed each frame's visual properties (luminance distribution, color palette, edge characteristics, bloom behavior) and generated GLSL shaders reproducing the style. This hybrid approach produced dramatically better results than either method alone. Clustering alone captured the average but missed the exceptional. Human selection alone would have been limited by my ability to articulate visual properties in technical terms. Describing "chiaroscuro bloom with chromatic aberration at luminance boundaries" is harder than pointing at a frame and saying "like this." Together they produced a visual vocabulary that neither could achieve alone: the human curates intent, the AI executes with precision.

The Collaboration Insight

AI-driven visual identity extraction works best as a collaboration, not an automation. The human curates intent. The AI executes with precision. The AI catches statistical patterns the human misses. The human catches emotional resonance the AI misses. The canonical correction was the most valuable finding of the entire project. Without algorithmic analysis, the system would have shipped with a visual identity that looked nothing like the source material. But without human curation, the effects would have been statistically representative yet aesthetically lifeless.

Phase 3: AI Effect Generation

generate_effect.py uses Claude Opus 4.7 to write runnable effect code. It supports four generation modes: From frame: A screenshot is sent as a base64-encoded image alongside the plugin contract spec. The model analyzes the frame's visual properties and generates a shader that reproduces the style. From description: A text prompt describing the desired visual effect. Extend: An existing effect's source code is sent for controlled variation. From canonical: A catalog entry's visual signature and representative frame seed the generation. Every generated effect passes through a 4-step validation pipeline before being saved to disk. On validation failure, the error message and rejected code are fed back to the model for up to 2 automatic retries. This self-correcting loop significantly improved the first-pass success rate. The output: 21 original Python effects, 2 canonical effects, and 42 GLSL shaders (21 original + 21 mutations).

The Validation Pipeline

Every AI-generated effect must pass all 4 steps before being saved. On failure, the error and rejected code are fed back for up to 2 retries.

python

# 1. Syntax check: catch malformed Python
ast.parse(source_code)

# 2. Export check: required interface
assert "EFFECT_META" in module.__dict__
assert callable(module.fx_function)

# 3. Test run: functional verification
result = module.fx_function(
    np.zeros((480, 640, 3), dtype=np.uint8),
    MockAudioFeatures(),
    {}
)

# 4. Shape match: output contract
assert result.shape == (480, 640, 3)
assert result.dtype == np.uint8

Phase 4: The Mutation Strategy

The first pass produced 21 original effects that faithfully reproduce the source material's visual identity. The second pass doubled the vocabulary: 21 mutation effects. Same DNA, new expressions. Each mutation was prompted with the parent effect's GLSL source and explicit instructions to preserve the core visual technique while varying at least 3 specific aspects: color palette, particle behavior, displacement function, feedback intensity, or temporal dynamics. The result is a family of related looks rather than a collection of unrelated ones. Mutations are not random perturbations. They are controlled genetic variations. For example, Confetti Particle Storm (30 particles, rainbow palette, bass-driven downward gravity) mutated into Acid Confetti (90 particles, cyan/lime/magenta palette, highs-driven upward gravity). The core technique (hash-based particle rendering over a starfield with body tinting) is preserved. The expression is entirely different. This approach proved more efficient than generating 21 completely new effects. Mutations inherit the parent's core technique, so they are aesthetically coherent by construction. The visual vocabulary expanded without losing the thread of the original identity.

Mutation Example: Confetti Particle Storm to Acid Confetti

Key differences: particle count 30 to 90, palette rainbow to cyan/lime/magenta, gravity direction down to up, audio driver bass to highs. Core technique (hash-based particles, starfield, body tint) preserved.

glsl

// PARENT: Confetti Particle Storm
// Pink body tint, 30 particles, bass-driven downward fall
vec3 body = mix(src.rgb, vec3(1.0, 0.4, 0.7), 0.4 * step(0.15, luma));
for (int i = 0; i < 30; i++) {
    // ...
    vec2 pos = vec2(
        fract(seed + sin(t * 0.3) * 0.3),
        fract(fi * 0.0371 - t * 0.15 * (0.5 + bass))  // downward
    );
    float size = 0.005 + bass * 0.008;
    vec3 cc = 0.5 + 0.5 * cos(6.28 * (seed * 3.0 + vec3(0, 0.33, 0.67)));
}

// MUTATION: Acid Confetti
// Cyan/lime body tint, 90 particles, highs-driven upward float
vec3 body = mix(src.rgb, vec3(0.0, 1.0, 0.7), 0.5 * step(0.15, luma));
for (int i = 0; i < 90; i++) {
    // ...
    vec2 pos = vec2(
        fract(seed + sin(t * 0.3) * 0.4),
        fract(fi * 0.0111 + t * 0.2 * (0.5 + highs))  // upward
    );
    float size = 0.004 + highs * 0.01;
    // Cyan/lime/magenta palette (discrete, not continuous)
    if (sel < 0.33) cc = vec3(0.0, 1.0, 1.0);
    else if (sel < 0.66) cc = vec3(0.5, 1.0, 0.0);
    else cc = vec3(1.0, 0.0, 1.0);
}

Phase 5: Autonomous TouchDesigner Construction

The Nous Research Hermes Agent, equipped with the TouchDesigner skill providing 36 native tools, constructed the complete TD network through the twozero MCP bridge. That's a JSON-RPC 2.0 server by 404.zero running on localhost:40404 that translates MCP tool calls into TouchDesigner Python API calls. The build sequence executed five scripts in order: 1. td_build_effects.py creates 21 baseCOMPs, each containing an inTOP, glslTOP (with the full pixel shader source), and outTOP. 1,347 lines of GLSL across 21 shaders. 2. td_build_mutations.py uses the same architecture for 21 additional baseCOMPs with mutation shader variants. 1,358 lines of GLSL. 3. td_wire_all.py wires all 43 effect outputs to the 3-router compositing topology: effect_router, layer2_router, and layer3_router, each receiving all 43 inputs. 4. td_add_prominence.py inserts a levelTOP between the glslTOP and outTOP inside each baseCOMP, with opacity driven by frequency bands (bass for effects 0-13, mids for 14-28, highs for 29-42) and a 30% beat flash on detected beats. 5. td_update_rotation.py writes the auto-rotate chopexecuteDAT script with aggressive parameters: 1.5s switch interval, 5-beat threshold, random.sample(range(N), 3) for 3-effect selection. Zero compile errors. 60 FPS. 0.2% CPU on a MacBook Pro. The production .toe file is 40KB.

The MCP Bridge Protocol

The twozero bridge translates JSON-RPC calls into TouchDesigner Python API calls. All build scripts use this pattern with retry logic and 120-second timeouts.

python

MCP_URL = "http://localhost:40404/mcp"

def td_call(tool: str, args: dict, retries: int = 2) -> str:
    """Call a twozero MCP tool and return text content."""
    payload = {
        "jsonrpc": "2.0",
        "id": 1,
        "method": "tools/call",
        "params": {"name": tool, "arguments": args},
    }
    data = json.dumps(payload).encode("utf-8")
    for attempt in range(retries + 1):
        try:
            req = urllib.request.Request(
                MCP_URL, data=data,
                headers={"Content-Type": "application/json"},
            )
            with urllib.request.urlopen(req, timeout=120) as resp:
                result = json.loads(resp.read().decode("utf-8"))
            content = result["result"]["content"]
            return "\n".join(
                c["text"] for c in content if c["type"] == "text"
            )
        except Exception:
            if attempt == retries:
                raise
            time.sleep(1)

3-Layer Compositing Architecture

The 3-layer system emerged from a creative need for visual density. A single effect at a time felt sparse. Three effects additively composited created the layered, overwhelming visual presence that matched the source material's aesthetic. All three routers have the same 43 effects wired, and the auto-rotate system picks 3 different random effects per switch event using random.sample(range(N), 3).

Camera Input: videodevinTOP supports MacBook Pro Camera, OBS Virtual Camera, and iPhone via USB (Continuity Camera)

Effect Processing: 43 baseCOMPs each apply a unique GLSL pixel shader (inTOP to glslTOP to prominence levelTOP to outTOP)

3-Router Selection: effect_router + layer2_router + layer3_router each select from the full bank of 43 effects

Additive Blending: blend_add1 composites layers 1+2, blend_add2 adds layer 3 (compositeTOP, add mode)

Master Level: blend_level applies global brightness/contrast scaling

Output: main_output windowCOMP at 1280x720 to projector or secondary display

ColumnTypePurpose

rmsFull spectrum (0-1)Overall energy level drives shader intensity scaling

sub_bass0-80 Hz (0-1)Sub-bass rumble drives slow displacement and deep pulse

bass80-300 Hz (0-1)Kick drums drive particle velocity, UV displacement, prominence (effects 0-13)

mids300-3000 Hz (0-1)Vocals and synths drive color modulation, prominence (effects 14-28)

highs3000 Hz+ (0-1)Hi-hats and cymbals drive fine detail, confetti density, prominence (effects 29-42)

beatTrigger (0/1)Beat onset triggers 30% brightness flash and auto-rotate (every 5 beats)

onsetTransient (0-1)Transient energy envelope gates effect switching (threshold 0.2)

Performance Engineering

11.72ms

Slowest Python Effect

Kanji Float at 85 FPS effective, well within 33.3ms budget

60 FPS

GLSL Chain

0.2% CPU on MacBook Pro (GPU-accelerated)

42.5ms to 1.4ms

Shard Burst Optimization

-97% via vectorized rotation + single fillPoly

35ms to 7.8ms

Film Grain Optimization

-78% via half-res uniform RNG, no HSV conversion

234 passing

Test Suite

smoke, audio, plugin, render, perf, video, generation

1.5 seconds

Auto-Rotate Interval

with 5-beat threshold and 0.8s onset debounce

Lessons Learned

Technical Gotchas TouchDesigner's constantCHOP does not support item assignment. Writing ae['effect_idx'] = val throws a TypeError on every frame. The correct pattern is ae.par.value0 = val. The auto-rotate callback must use whileOn, not onValueChange. The latter stops firing when audio values go static during silence, which stalls the entire rotation. The keyboardinDAT requires focusselect='anywhere' or it only responds when its viewer panel has focus. A switchTOP with zero inputs defaults to 128x128 resolution regardless of custom parameters. And when inserting a levelTOP between existing operators, you must explicitly disconnect the downstream input first. Our initial td_add_prominence.py script had a wiring bug that left all 43 output nodes disconnected. Process Insights The canonical correction was the single most valuable finding. Without algorithmic analysis, the system would have shipped with a visual identity that looked nothing like the source material. Human intuition alone produced effects that were visually interesting but aesthetically wrong. Human curation remained indispensable despite the power of algorithmic clustering. The best effects were generated from human-selected frames. The statistical analysis told us what was common in the source. The human selection told us what was good. Mutation was more valuable than generation. The 21 mutation effects expanded the visual vocabulary more efficiently than generating 21 completely new effects would have. Mutations inherit the parent's core technique while varying specific parameters, producing a family of related looks rather than a collection of unrelated ones. The MCP bridge approach scales. Building 43 effects programmatically through the bridge was substantially faster and more reliable than manual TouchDesigner interaction would have been. The bridge also enabled rapid iteration: when a shader didn't look right, the build script could be re-run with modified source code in seconds.

Open Source

The Yousuke system is open source under the MIT license: github.com/ConejoCapital/Yousuke

Built by Mauricio Trujillo Ramirez aka Bunny / Tektonic Company.
Powered by Claude Opus 4.7 (Anthropic) for AI effect generation.
Powered by Hermes Agent (Nous Research) for autonomous TouchDesigner construction.
MCP bridge by twozero (404.zero + setupdesign).

Special thanks to SHL0MS, Nous Research, 404.zero, and setupdesign.