See how they actually work.
Not how they interview.
AI-native work trials that predict on-the-job performance — with evidence.
Trusted by founders from YC, SPC, and more
Hiring is broken.
Everyone knows it.
That sinking feeling three months in when your "great interview" turns into a performance problem.
Resumes lie
Credentials don't predict performance. You're hiring based on marketing, not capability.
Interviews are theater
Candidates perform rehearsed answers. You learn who interviews well, not who ships.
LeetCode is cargo cult
Inverting binary trees has nothing to do with building products. Memorization isn't problem-solving.
AI changes everything
Your best hire uses AI to 10x their output. Traditional interviews penalize this.
"The best predictor of future performance is past performance — in similar conditions."
So why are we still using artificial conditions to predict real work?
Four steps to certainty
A work trial that shows you who someone is — not who they pretend to be in an interview.
Local Environment
Real environment. Real tools.
Candidate works on their own machine with their choice of AI tools. Just like their actual job.
Ambiguous Assignment
Designed to reveal agency.
A real-world project with incomplete requirements. See how they handle ambiguity and turn chaos into shipped code.
Build a support inbox triage system. Ingest tickets, cluster by topic, propose auto-replies, admin UI for review.
"Requirements intentionally incomplete. Ask questions. Make decisions. Ship something real."
Observer Agent
Intelligent probing. Not surveillance.
Our AI observes how they work and asks surgical questions at key moments to understand their thinking.
Evidence-Based Report
Signal, not vibes.
Rubric scores, evidence clips, and a clear recommendation you can act on.
Decisions backed by evidence
Not a vibe. A report that makes the hiring decision obvious — with receipts.
Candidate Evaluation Report
Full-Stack Engineer • Jan 2025
14.2 hours
47
12
Rubric Scores
Key Evidence
14:23 — "Chose to ship auth-less MVP first, documented security as Day 2 priority. Explicitly traded speed for completeness."
16:45 — When tests failed, diagnosed root cause in 8 mins using Claude Code. Fixed without introducing regressions.
What we measure
Six dimensions that predict on-the-job success. Each scored 1–5 with evidence.
Intelligence
Problem decomposition, pattern recognition, learning speed
Grit
Persistence through ambiguity and setbacks
AI Tool Usage
Leverages AI effectively as a force multiplier
Openness
Receptivity to feedback, new approaches, and change
Judgment
Prioritization, tradeoffs, knowing what matters
Agency
Proactive decisions, owns outcomes, escalates with options
Built for the real world
Not another coding test. A rethinking of how to predict who will ship.
Tests ambiguity tolerance
Incomplete specs, shifting requirements. See who thrives in chaos and who freezes.
Measures actual agency
Do they wait for permission or make decisions? Agency is observable, not self-reported.
Observes AI usage (doesn't ban it)
We measure how well they leverage AI tools — not whether they can work without them.
3-day signal vs 1-hour snapshot
Energy management, iteration cycles, how they handle getting stuck. Patterns, not performances.
Defensible hiring decisions
Every recommendation backed by timestamped evidence. Show your board exactly why you hired them.
Works with AI-native candidates
The best engineers use every tool available. So should the evaluation.
Traditional hiring vs. Polymath
- ✕Resume screening (marketing document)
- ✕LeetCode (tests memorization)
- ✕Behavioral interviews (rehearsed answers)
- ✕"Culture fit" (vibes-based)
- ✕Decision made on 3 hours of performance
- ✓Real work trial (actual output)
- ✓Ambiguous problems (tests thinking)
- ✓Observed behavior (not self-reported)
- ✓Evidence-backed rubric (defensible)
- ✓3-day signal (patterns, not moments)
For teams who can't afford to guess
Bad hires cost 6–12 months. The wrong engineer can kill a startup.
Startup Founders
Seed to Series B
Every engineer either accelerates or drags. Get certainty before committing $200K+/year.
"We've been burned twice by 'great interviews.' Never again."
Hiring Managers
Engineering leads
Your gut says hire, but can you defend it? Get a report that makes the decision obvious.
"Finally, something that shows me how they actually work."
Technical Recruiters
In-house teams
Resumes all look the same. Give your hiring managers signal they can act on.
"The report sells itself. Hiring managers trust it."
Currently onboarding pilot partners
We're working with a small group of startups to refine the process. Hiring engineers soon? Let's talk.
Become a Pilot PartnerJoin the waitlist
Limited pilot spots available. Be among the first.