Skip to content
TechAdes

Research

The open problems we're working on, and why they matter.

TechAdes is a research-driven company. The products we ship are downstream of the research questions we're trying to answer. Below are the four problem areas our work focuses on today. Each section describes what we believe is currently unsolved, how we're approaching it, and what the next twelve to eighteen months look like.


Multi-agent orchestration for creative production

The problem. Generative AI tools today are dominated by single-prompt workflows: a user writes a prompt, the model produces an output, the user evaluates. This pattern works for isolated, one-shot tasks. It breaks down for creative production at volume, where the work is iterative, layered, and brand-sensitive. A real marketing or creative team doesn't work as a single generative pass. It works as a coordinated set of specialists with planning, drafting, review, and revision cycles.

What's unsolved. Public AI agent frameworks (LangChain, AutoGen, CrewAI) provide infrastructure for agent coordination but none address creative-production-specific needs: brand-awareness across agents, taste-driven quality control, or iterative refinement under a unified style. Building this layer requires structuring how brand identity is shared between agents, how planning is decomposed across specialized roles, and how the system arrives at a final output that feels coherent and on-brand.

What we're building. TechAdes' multi-agent orchestration system specializes a small number of agents (planner, copywriter, image composer, layout, publisher) around a shared structured brand representation. The planner agent decomposes a strategic brief into a calendar of specific deliverables. Each specialist agent produces its output drawing from the brand representation and the planner's brief. A separate evaluation agent grades each deliverable for brand alignment and quality before it ships. Twelve to eighteen-month horizon for the full system in production.


Structured brand representation

The problem. Most AI tools represent a brand as a logo, a color palette, and maybe a tone-of-voice prompt. A real brand is far more: a voice, a typographic feel, a visual lexicon, a set of cultural references, and a set of explicit do-and-don't rules. Without a richer representation, AI outputs feel generic. The same prompt produces interchangeable results across brands.

What's unsolved. No public standard exists for representing brand identity in a form generative models can reliably consume and respect. Closed solutions exist inside large agencies and a few enterprise platforms, but they are not openly documented or generalizable.

What we're building. TechAdes is developing the "Brand Brain": a structured, queryable representation of brand identity covering voice signatures (stylometric features), visual lexicon (mood, palette, typographic feel, composition rules), and behavioral rules (what the brand does, what it doesn't). The representation is extracted automatically from a company's existing materials (website, past content, uploaded documents) and refined through use. The representation is the foundation every other agent in the pipeline draws from.


Automated brand-consistency grading

The problem. A human creative director knows in two seconds whether an output is on-brand. AI systems don't. There is no standard automated method for measuring whether a generated image, post, or piece of copy matches a brand's identity. The result: AI-produced creative requires extensive human review, which negates much of the efficiency gain.

What's unsolved. Vision-language models can describe images in detail but are not aligned to brand-specific style criteria. Text classifiers can detect tone but not voice-signature coherence. There is no published benchmark for "brand alignment" as an automatically measurable property.

What we're building. TechAdes' brand-consistency grader uses a combination of vision-language models, stylometric analysis, and the Brand Brain representation to produce a per-output alignment score. The grader runs inside the production loop: outputs failing the grader trigger automatic refinement before being surfaced to the user. The grader's training data and evaluation methodology will be one of our first public research contributions.


Generative Engine Optimization (GEO)

The problem. Discovery is moving from search engines to AI answer engines (ChatGPT, Perplexity, Google AI Overviews, Anthropic Claude). When a user asks an AI assistant a question, the answer cites a small number of sources. Being one of those sources is the new search ranking. Traditional SEO tools and methodologies do not address this surface.

What's unsolved. There is no established practice or measurement framework for optimizing content to be cited by AI answer engines. Different engines surface different content for reasons that are not yet well-characterized publicly.

What we're building. TechAdes is developing GEO measurement and optimization tooling alongside Obelo. The work involves reverse-engineering citation patterns across AI answer engines, structuring content to maximize the probability of citation, and building monitoring tools that track a brand's presence across engines over time. We expect GEO to be a substantive surface by 2027 and want TechAdes positioned as a credible voice in it.


Open research

Research projects that sit outside the product line. Exploratory work that informs how we build, and that we expect to publish or open source as it matures.

Meta-Harness++

A composite agent harness for software-engineering bug repair, restricted to DeepSeek's two model families (deepseek-reasoner and deepseek-v4-pro). The constraint is the point: a deliberately narrow model surface, with everything else (retrieval, prompting strategy, candidate generation, validation) engineered to compensate. The benchmark is SWE-Bench Pro, scored against the official hidden test suite.

The harness layers per-instance skeleton building, BM25 + embedding + reranker retrieval, multi-strategy patch generation across 28+ prompting "levers," candidate validation with a strict pass criterion (≥1 PASSED, 0 FAILED), and the official Scale grader. A worktree-edit semantic repair strategy sidesteps the model's weak unified-diff emission by having it write full file contents, then deriving the diff from git diff. A SQLite-backed candidate store indexes every diff with its lever, model, syntax-validation status, and grade signature, enabling yield-weighted Thompson-style scheduling over the lever set. Two systemd-managed loops run around the clock against separate instance slices, with full state checkpointing and self-restart on crash.

Current results: 74.4% strict-PASS on a 43-instance development slice; 20.1% on a 199-instance broader sample across nine repos and multiple languages. The 74.4% figure is on a slice we used to validate the harness and is not directly comparable to Anthropic's published 64.3% Claude Opus 4.7 Adaptive result on the full 731-instance public set. An apples-to-apples claim requires a full-benchmark submission, which is the next milestone. About 10,000 lines of Python across the harness layers, with the autonomous loops continuing to mine new wins through lever rotation and targeted near-miss attacks.

Inference-time methods, Stanford-inspired

A second active research effort, inspired by recent work out of Stanford, is in early-stage development. Notes and results will follow here as the work approaches publication.