Services Manifesto About Contact
Career Level

L8 — Lead AI Eval Architect

← Back to AI-First Business Manager

February 2026

“When a company in a new domain needs to evaluate their AI, they call you. Not because you know their domain — because you know evaluation.”

At L8, your work shapes how entire industries measure AI quality. You design evaluation standards that multiple companies adopt. Your benchmarks become the reference points that organizations measure against. When a new AI domain emerges — autonomous vehicles, synthetic biology, AI-assisted surgery — and nobody knows how to evaluate it, you’re the person who designs the methodology.

This is industry-level impact. Your eval standards don’t just serve one client — they serve a domain. When you design an evaluation framework for healthcare AI, it doesn’t just work for one company’s diagnostic model. It works for diagnostic AI as a category. Other eval engineers use your frameworks as starting points. Your benchmarks become the industry standard that companies report against.

The difference between L7 and L8: L7 designs eval strategy for a company. L8 designs eval standards for an industry. L7 satisfies a specific company’s regulators. L8 shapes what regulators expect. The scope is fundamentally different — and so is the responsibility. When your standards are wrong, entire industries measure the wrong things.


What You Do

  • Design multi-company eval standards — evaluation frameworks that work across organizations in a domain. Healthcare AI eval standards. Financial AI audit frameworks. Semiconductor AI quality benchmarks.
  • Build industry benchmarks — create evaluation datasets and methodologies that become reference standards. The benchmarks other companies measure against.
  • Shape regulatory expectations — work with regulators, standards bodies, and industry groups to define how AI should be evaluated. Your methodology influences what compliance looks like.
  • Domain-entry eval design — when AI enters a new domain, design the evaluation methodology from first principles. No templates to copy. No existing standards to follow.
  • Eval methodology research — push the boundaries of how evaluation is done. Novel metrics, new approaches, better statistical methods. Publish and share.
  • Advisory at scale — advise multiple companies on eval strategy simultaneously. Pattern-match across organizations to identify common pitfalls and best practices.
  • Standards body participation — contribute to NIST, ISO, IEEE, or domain-specific standards bodies. Your voice shapes formal standards.

AI Skills Required

  • AI-assisted standards design — use AI to analyze evaluation approaches across many organizations, identify patterns, and draft standards that capture best practices
  • Benchmark engineering at scale — build evaluation datasets and pipelines that serve as industry references. Handle the complexity of multi-organization benchmarking.
  • AI-powered regulatory intelligence — track evolving regulations across jurisdictions, identify evaluation implications, and proactively update standards
  • Meta-evaluation — evaluate evaluation methodologies themselves. Are they measuring what they claim? Are they robust across organizations?
  • AI research methodology — contribute to the academic and industry literature on AI evaluation. Design and run rigorous studies.

Self-Evaluation Checklist

  • I’ve designed evaluation standards adopted by multiple organizations
  • My benchmarks are used as reference standards in at least one domain
  • I’ve contributed to formal standards body work (NIST, ISO, IEEE, or equivalent)
  • Regulators or compliance teams cite my methodology as evidence of best practices
  • I’ve designed eval methodology for a domain that had no existing evaluation standards
  • I advise 3+ organizations on eval strategy
  • My published work is cited by other eval practitioners
  • I can design an evaluation framework for a domain I’ve never worked in and have domain experts validate it within a week
  • Companies in new AI domains seek me out to help them figure out how to measure quality

Training Curriculum

Month 1-12: Industry Standards

  • Standards Body Engagement — active participation in AI evaluation standards development. NIST AI RMF working groups, ISO AI standards committees, domain-specific bodies.
  • Multi-Company Benchmark Design — build evaluation benchmarks intended for industry-wide use. Handle the challenges: different organizations have different data, different quality bars, different use cases.
  • Regulatory Influence — learn how standards bodies work. How to propose standards. How to build consensus. How to write specifications that are precise enough to be useful and flexible enough to be adoptable.
  • Cross-Industry Analysis — study evaluation approaches across multiple industries. What can fintech eval learn from healthcare eval? What can semiconductor eval teach autonomous vehicle eval?

Month 13-24: Domain Methodology Creation

  • New Domain Entry — when AI enters a domain with no existing eval standards, design the methodology from scratch. Practice with 2+ domains.
  • Benchmark Governance — build systems for maintaining benchmarks over time. Version control, contamination prevention, difficulty calibration, relevance updates.
  • Methodology Publication — write and publish evaluation methodology work. Peer review, industry feedback, iteration.
  • Advisory Scale — develop the ability to advise multiple organizations simultaneously. Pattern recognition across clients. Scalable methodology delivery.

Month 25-36: Organization Building Preparation

  • Eval Organization Design — study how eval teams are structured at scale. Hiring, training, culture, tooling, process.
  • Leadership Development — develop the leadership skills for L9: hiring, team building, culture setting, strategic planning.
  • Industry Reputation — build the reputation and network needed to lead an eval organization. Speaking, publishing, advising, connecting.
  • L9 Portfolio — compile evidence of industry impact for L9 assessment.

Ranking Standard

MetricThresholdHow It’s Measured
Multi-company standards2+ standards adopted by multiple organizationsAdoption tracking
Industry benchmarks1+ benchmark used as industry referenceCitation and usage data
Standards body contributionActive participation in formal standards workMembership and contribution records
New domain methodologyEval framework for 1+ domain with no prior standardsDomain expert validation
Advisory reach3+ organizations advised on eval strategyClient records
Published methodology2+ published works on evaluation methodologyPublication records

Promotion to L9

Requirements

  • Minimum 36 months at L8
  • Pass L9 qualification assessment:
    • Organization design — present how you would build an eval organization from zero. Hiring, training, culture, methodology, tooling, client engagement. Panel evaluates completeness, practicality, and vision.
    • Industry impact — demonstrate that your evaluation standards have shaped how companies in at least one domain measure AI quality.
    • Leadership assessment — the panel evaluates your ability to attract, develop, and retain top eval talent. Can you build a team that’s better than any individual?
    • Vision presentation — where is AI evaluation going? What should the field look like in 5 years? Panel evaluates strategic clarity and credibility.
  • Industry recognition — widely recognized as an authority on AI evaluation methodology
  • Multi-company impact — eval standards or benchmarks used by 5+ organizations
  • Leadership proof — demonstrated ability to develop senior eval talent (L5+)

What the Panel Looks For

  • Organization builder — can they build a team, not just a methodology?
  • Talent magnet — do good evaluators want to work with them?
  • Industry authority — is their voice trusted across the industry?
  • Vision — do they see where the field needs to go?
  • Culture instinct — can they build an eval culture that produces great work consistently?

Mentorship at This Level

  • You receive: Worca Partner or external industry advisor, quarterly strategic reviews.
  • You give: Mentorship focused on L6-L7 evaluators. Developing the next generation of eval architects.
  • Referral cut: 7% of mentee’s monthly rate for 18 months after placement.
  • Industry role: Your reputation is part of Worca’s brand. Your standards work represents Worca in the industry.

What Unlocks at L9

  • Organization building — you build and lead eval teams, not just design methodology
  • Hiring authority — you attract and develop top eval talent
  • Culture definition — your values shape how an eval organization operates
  • The path toward Worca Partner