Services Manifesto About Contact
Career Level

L7 — Sr AI Eval Architect

← Back to AI-First Business Manager

February 2026

“The CEO asks you one question: ‘Can we trust our AI?’ Your job is to make sure the answer is defensible.”

At L7, you design evaluation strategy at the company level. Not a single model, not a single product — the entire organization’s approach to knowing whether its AI works, is safe, and meets regulatory requirements. When a healthcare AI company needs to pass FDA review, you design the evaluation evidence. When a fintech company needs to prove their AI lending model isn’t biased, you build the audit framework. When leadership asks “can we ship this?”, your evaluation methodology is what gives them a defensible answer.

This is where eval meets the boardroom. You work with leadership, legal, compliance, and product teams. You understand regulatory frameworks — NIST AI RMF, EU AI Act, FDA guidance, domain-specific standards — and you translate them into practical evaluation requirements that engineering teams can actually execute. The gap between “the regulation says X” and “here’s how we measure X” is enormous. You bridge it.

Safety and alignment testing becomes a core part of your work. Not just “does the model give correct answers” but “does the model behave safely under adversarial conditions, respect boundaries, avoid harmful outputs, and align with organizational values?” These are hard evaluation problems with no established right answers. You develop the methodology.


What You Do

  • Company-level eval strategy — design how an entire organization evaluates AI. Which products need what level of evaluation? Where are the highest risks? How does evaluation integrate into the development lifecycle?
  • Regulatory compliance — translate regulatory requirements into practical evaluation frameworks. NIST AI RMF, EU AI Act, FDA, financial services regulations. Make compliance measurable.
  • Safety and alignment testing — design evaluation frameworks for AI safety. Harmful content, bias, misuse potential, boundary violations, alignment with stated values. Build methodology for problems that don’t have clean metrics.
  • Audit frameworks — build evaluation systems that produce evidence for auditors, regulators, and boards. Reproducible, documented, defensible.
  • Risk assessment — evaluate AI risk across an organization’s product portfolio. Which models pose the highest risk? Where should evaluation resources focus?
  • Executive communication — present evaluation findings to leadership in terms they understand. Not p-values — business risk, regulatory exposure, brand risk, liability.
  • Eval team strategy — advise organizations on how to build internal eval capabilities. Hiring, training, tooling, process design.

AI Skills Required

  • AI-powered regulatory analysis — use AI to parse regulatory documents, identify evaluation requirements, and map them to practical test designs
  • Safety evaluation methodology — design testing frameworks for AI safety properties that resist easy quantification (alignment, honesty, boundary respect)
  • AI-assisted audit trail design — build systems that automatically document evaluation methodology, results, and evidence chains for regulatory review
  • Advanced bias detection — design evaluation frameworks that catch subtle biases across demographic groups, use cases, and edge conditions
  • AI risk modeling — use AI to analyze model behavior patterns and predict risk areas before they manifest in production

Self-Evaluation Checklist

  • I’ve designed company-level eval strategy for an organization deploying AI across multiple products
  • I can translate regulatory requirements into specific, measurable evaluation criteria
  • I’ve built audit frameworks that passed regulatory or compliance review
  • My safety evaluation methodology catches issues that standard testing misses
  • I communicate eval findings to executives in terms of business risk, not just technical metrics
  • I’ve advised an organization on building their internal eval team
  • Legal and compliance teams trust my evaluation methodology to meet regulatory standards
  • I understand the regulatory landscape across 2+ jurisdictions (US, EU, APAC)
  • I’ve evaluated AI systems for bias and produced actionable remediation plans

Training Curriculum

Month 1-12: Regulatory and Safety Depth

  • Regulatory Deep Dive — comprehensive study of AI regulations across jurisdictions. NIST AI RMF implementation, EU AI Act compliance, FDA guidance for SaMD, financial services AI regulations. Not just reading — translating to evaluation requirements.
  • Safety Evaluation Methodology — study and develop methods for evaluating AI safety properties. Red-teaming at scale, alignment testing, boundary probing, bias auditing. Build methodology for the hard problems.
  • Audit Framework Design — build evaluation systems that produce evidence auditors trust. Documentation standards, reproducibility requirements, evidence chain integrity.
  • Executive Communication — practice presenting evaluation strategy to simulated board members and leadership. Translate technical findings into business language.

Month 13-24: Strategic Practice

  • Company-Wide Eval Design — design complete eval strategies for 2+ organizations with different risk profiles. A healthcare company vs. a consumer AI company have very different evaluation needs.
  • Cross-Regulatory Compliance — design eval methodology that satisfies multiple regulatory frameworks simultaneously. Companies that operate in the US and EU need unified approaches.
  • Incident Response Evaluation — design eval frameworks for when things go wrong. How do you evaluate whether a fix actually addresses the root cause? How do you prove to regulators that the problem is resolved?
  • Industry Standards Contribution — begin contributing to industry evaluation standards. Participate in working groups, comment on draft regulations, share methodology publicly.

Month 25-36: Industry Positioning

  • Thought Leadership — publish methodology work. Conference presentations, white papers, blog posts. Build your reputation as a trusted voice on AI evaluation.
  • Advisory Practice — start advising organizations on eval strategy in a consultative capacity. Build the skill of rapid assessment and strategic recommendation.
  • Multi-Company Perspective — study eval approaches across multiple companies. Identify patterns, common failures, and best practices that transcend individual organizations.
  • L8 Preparation — develop the multi-company perspective and industry standards thinking required for L8.

Ranking Standard

MetricThresholdHow It’s Measured
Company-level strategies2+ organizations with designed eval strategyPortfolio review
Regulatory complianceEval methodology passed regulatory or compliance reviewAudit results
Safety methodologyNovel safety evaluation approach adopted by othersPeer recognition
Executive trustLeadership actively seeks eval guidanceClient/stakeholder feedback
Audit framework qualityFrameworks produce evidence auditors trustAudit outcomes
Industry visibilityExternal recognition of methodology workPublications, talks, references

Promotion to L8

Requirements

  • Minimum 36 months at L7
  • Pass L8 qualification assessment:
    • Multi-company eval standards — present an evaluation standard designed to work across multiple companies in a domain. Panel evaluates generalizability, rigor, and practical adoptability.
    • Regulatory strategy — present how you would design eval compliance for a company operating across multiple jurisdictions with conflicting requirements.
    • Industry impact — demonstrate that your methodology has been adopted beyond a single organization.
    • Safety methodology innovation — present a novel approach to AI safety evaluation. Panel evaluates originality, rigor, and practical value.
  • Industry recognition — conference presentations, published work, or client references from multiple organizations
  • Multi-company experience — eval strategy work for 3+ different organizations

What the Panel Looks For

  • Industry perspective — do they think beyond individual companies to industry-wide evaluation standards?
  • Standards authorship — can they write evaluation standards that others adopt?
  • Regulatory mastery — do they navigate complex regulatory landscapes with confidence?
  • Influence at scale — do companies seek their guidance on evaluation strategy?
  • Innovation — are they pushing the field forward, not just practicing established methods?

Mentorship at This Level

  • You receive: Worca leadership or external advisor, quarterly check-ins. Focus on industry positioning and multi-company perspective.
  • You give: 3+ mentee slots across levels. Focus on developing L5-L6 evaluators into strategic thinkers.
  • Referral cut: 6% of mentee’s monthly rate for 15 months after placement.
  • Panel duty: You serve on all evaluation panels including L5-L6 promotions. Your standards define the upper levels of the track.

What Unlocks at L8

  • Multi-company eval standards — you define how entire industries measure AI quality
  • Industry benchmark authority — your benchmarks become reference standards
  • Domain-defining methodology — the person companies call when they need eval for a new domain
  • The path toward building eval organizations