Career Level

L7 — Sr AI Eval Architect

February 2026

“The CEO asks you one question: ‘Can we trust our AI?’ Your job is to make sure the answer is defensible.”

At L7, you design evaluation strategy at the company level. Not a single model, not a single product — the entire organization’s approach to knowing whether its AI works, is safe, and meets regulatory requirements. When a healthcare AI company needs to pass FDA review, you design the evaluation evidence. When a fintech company needs to prove their AI lending model isn’t biased, you build the audit framework. When leadership asks “can we ship this?”, your evaluation methodology is what gives them a defensible answer.

This is where eval meets the boardroom. You work with leadership, legal, compliance, and product teams. You understand regulatory frameworks — NIST AI RMF, EU AI Act, FDA guidance, domain-specific standards — and you translate them into practical evaluation requirements that engineering teams can actually execute. The gap between “the regulation says X” and “here’s how we measure X” is enormous. You bridge it.

Safety and alignment testing becomes a core part of your work. Not just “does the model give correct answers” but “does the model behave safely under adversarial conditions, respect boundaries, avoid harmful outputs, and align with organizational values?” These are hard evaluation problems with no established right answers. You develop the methodology.

What You Do

Company-level eval strategy — design how an entire organization evaluates AI. Which products need what level of evaluation? Where are the highest risks? How does evaluation integrate into the development lifecycle?
Regulatory compliance — translate regulatory requirements into practical evaluation frameworks. NIST AI RMF, EU AI Act, FDA, financial services regulations. Make compliance measurable.
Safety and alignment testing — design evaluation frameworks for AI safety. Harmful content, bias, misuse potential, boundary violations, alignment with stated values. Build methodology for problems that don’t have clean metrics.
Audit frameworks — build evaluation systems that produce evidence for auditors, regulators, and boards. Reproducible, documented, defensible.
Risk assessment — evaluate AI risk across an organization’s product portfolio. Which models pose the highest risk? Where should evaluation resources focus?
Executive communication — present evaluation findings to leadership in terms they understand. Not p-values — business risk, regulatory exposure, brand risk, liability.
Eval team strategy — advise organizations on how to build internal eval capabilities. Hiring, training, tooling, process design.

AI Skills Required

AI-powered regulatory analysis — use AI to parse regulatory documents, identify evaluation requirements, and map them to practical test designs
Safety evaluation methodology — design testing frameworks for AI safety properties that resist easy quantification (alignment, honesty, boundary respect)
AI-assisted audit trail design — build systems that automatically document evaluation methodology, results, and evidence chains for regulatory review
Advanced bias detection — design evaluation frameworks that catch subtle biases across demographic groups, use cases, and edge conditions
AI risk modeling — use AI to analyze model behavior patterns and predict risk areas before they manifest in production

Self-Evaluation Checklist

I’ve designed company-level eval strategy for an organization deploying AI across multiple products
I can translate regulatory requirements into specific, measurable evaluation criteria
I’ve built audit frameworks that passed regulatory or compliance review
My safety evaluation methodology catches issues that standard testing misses
I communicate eval findings to executives in terms of business risk, not just technical metrics
I’ve advised an organization on building their internal eval team
Legal and compliance teams trust my evaluation methodology to meet regulatory standards
I understand the regulatory landscape across 2+ jurisdictions (US, EU, APAC)
I’ve evaluated AI systems for bias and produced actionable remediation plans

Training Curriculum

Month 1-12: Regulatory and Safety Depth

Regulatory Deep Dive — comprehensive study of AI regulations across jurisdictions. NIST AI RMF implementation, EU AI Act compliance, FDA guidance for SaMD, financial services AI regulations. Not just reading — translating to evaluation requirements.
Safety Evaluation Methodology — study and develop methods for evaluating AI safety properties. Red-teaming at scale, alignment testing, boundary probing, bias auditing. Build methodology for the hard problems.
Audit Framework Design — build evaluation systems that produce evidence auditors trust. Documentation standards, reproducibility requirements, evidence chain integrity.
Executive Communication — practice presenting evaluation strategy to simulated board members and leadership. Translate technical findings into business language.

Month 13-24: Strategic Practice

Company-Wide Eval Design — design complete eval strategies for 2+ organizations with different risk profiles. A healthcare company vs. a consumer AI company have very different evaluation needs.
Cross-Regulatory Compliance — design eval methodology that satisfies multiple regulatory frameworks simultaneously. Companies that operate in the US and EU need unified approaches.
Incident Response Evaluation — design eval frameworks for when things go wrong. How do you evaluate whether a fix actually addresses the root cause? How do you prove to regulators that the problem is resolved?
Industry Standards Contribution — begin contributing to industry evaluation standards. Participate in working groups, comment on draft regulations, share methodology publicly.

Month 25-36: Industry Positioning

Thought Leadership — publish methodology work. Conference presentations, white papers, blog posts. Build your reputation as a trusted voice on AI evaluation.
Advisory Practice — start advising organizations on eval strategy in a consultative capacity. Build the skill of rapid assessment and strategic recommendation.
Multi-Company Perspective — study eval approaches across multiple companies. Identify patterns, common failures, and best practices that transcend individual organizations.
L8 Preparation — develop the multi-company perspective and industry standards thinking required for L8.

Ranking Standard

Metric	Threshold	How It’s Measured
Company-level strategies	2+ organizations with designed eval strategy	Portfolio review
Regulatory compliance	Eval methodology passed regulatory or compliance review	Audit results
Safety methodology	Novel safety evaluation approach adopted by others	Peer recognition
Executive trust	Leadership actively seeks eval guidance	Client/stakeholder feedback
Audit framework quality	Frameworks produce evidence auditors trust	Audit outcomes
Industry visibility	External recognition of methodology work	Publications, talks, references

Promotion to L8

Requirements

Minimum 36 months at L7
Pass L8 qualification assessment:
- Multi-company eval standards — present an evaluation standard designed to work across multiple companies in a domain. Panel evaluates generalizability, rigor, and practical adoptability.
- Regulatory strategy — present how you would design eval compliance for a company operating across multiple jurisdictions with conflicting requirements.
- Industry impact — demonstrate that your methodology has been adopted beyond a single organization.
- Safety methodology innovation — present a novel approach to AI safety evaluation. Panel evaluates originality, rigor, and practical value.
Industry recognition — conference presentations, published work, or client references from multiple organizations
Multi-company experience — eval strategy work for 3+ different organizations

What the Panel Looks For

Industry perspective — do they think beyond individual companies to industry-wide evaluation standards?
Standards authorship — can they write evaluation standards that others adopt?
Regulatory mastery — do they navigate complex regulatory landscapes with confidence?
Influence at scale — do companies seek their guidance on evaluation strategy?
Innovation — are they pushing the field forward, not just practicing established methods?

Mentorship at This Level

You receive: Worca leadership or external advisor, quarterly check-ins. Focus on industry positioning and multi-company perspective.
You give: 3+ mentee slots across levels. Focus on developing L5-L6 evaluators into strategic thinkers.
Referral cut: 6% of mentee’s monthly rate for 15 months after placement.
Panel duty: You serve on all evaluation panels including L5-L6 promotions. Your standards define the upper levels of the track.

What Unlocks at L8

Multi-company eval standards — you define how entire industries measure AI quality
Industry benchmark authority — your benchmarks become reference standards
Domain-defining methodology — the person companies call when they need eval for a new domain
The path toward building eval organizations

← All Levels