Project Atlas: What 10,000 AI Coaching Sessions Taught Us About Founder Decision-Making

We spent the last six months running a quiet experiment inside MentorMe.

We gave 847 early-stage founders access to Atlas — our multi-agent AI coaching system — and tracked everything. Every question they asked. Every recommendation the system made. Every decision they changed (or didn't change) after getting advice. Every session that ended with action versus every session that ended with more confusion.

We logged 10,247 coaching sessions across those six months. What we learned about AI coaching, why "more advice" is the wrong instinct for struggling founders, and what the architecture around founder support has to look like next — that's what this post is about.

The headline finding will surprise you: the sessions where Atlas gave LESS advice produced 3.4x more founder follow-through than sessions where it gave comprehensive guidance. The more complete the recommendation, the less likely the founder was to act on it.

That single insight reshaped how we think about AI coaching entirely.

What Changed With Atlas v3

Previous versions of MentorMe's coaching worked like most AI assistants: founder asks a question, system generates an answer. Sometimes a good answer. Sometimes a great one. But our follow-through data was ugly — only 12% of founders implemented the advice they received within 7 days.

Twelve percent. That's worse than the implementation rate for human mentorship programs, which typically land around 25-35%.

The obvious reaction was to make the advice better. More specific. More actionable. More tailored to each founder's context. So we did. Atlas v2 incorporated company stage, revenue data, team size, industry vertical, and 14 other context signals to generate hyper-personalized recommendations.

Follow-through dropped to 9%.

We were solving the wrong problem.

The Advice Paradox

Here's what nobody tells you about founder coaching: the bottleneck is almost never information quality. It's decision confidence.

When a founder asks "should I hire a marketer or invest in paid ads?" they usually already know the answer. They've thought about it. They've read the blog posts. They've done the math on a napkin. What they actually need isn't a better analysis of the tradeoff. They need someone to help them see which of their assumptions is the weak link — and then make the call.

Giving them a 2,000-word comparison of hiring versus ads doesn't solve the confidence problem. It adds more information to a brain already overloaded with information. It makes the decision feel bigger, not smaller.

We found this pattern repeated across thousands of sessions. The founders who asked the most questions and received the most comprehensive answers had the LOWEST implementation rates. The founders who received short, Socratic-style coaching — questions instead of answers, frameworks instead of solutions, constraints instead of options — took action at 3.4x the rate.

This is counterintuitive if you come from a product mindset. More features, more detail, more comprehensiveness feels like more value. But coaching doesn't work like software. Coaching works like a mirror. The best coaches reflect your thinking back to you in a way that makes the right decision obvious.

Why a Generic AI Chat Can't Do This

This is also why pointing ChatGPT or Claude at a founder's problem doesn't produce reliable coaching outcomes. Not because the models aren't smart enough — they are. But because a general-purpose model defaults to giving the most helpful, comprehensive answer it can. That's exactly what you want from a research assistant. It's exactly what you DON'T want from a coach.

A coach needs to strategically withhold information. A coach needs to ask the uncomfortable question instead of answering the comfortable one. A coach needs to know when the founder is using "I need more data" as a stalling mechanism for a decision they're afraid to make.

A general-purpose model will never do this because its optimization target is helpfulness. A coaching model's optimization target is founder ACTION. These objectives actively conflict in about 40% of coaching scenarios.

We measured this directly. We ran 500 sessions with raw Claude 3.5 Sonnet as the coaching engine and 500 sessions with Atlas v3's constrained coaching architecture using the same base model. Same founders. Same question types. Same context.

Raw Claude: 11% follow-through at 7 days. Atlas v3: 38% follow-through at 7 days.

Same underlying intelligence. Radically different architecture. The architecture IS the product.

The Decision Harness: How Atlas v3 Actually Works

Atlas v3 decomposes every coaching interaction into a six-stage pipeline. This is the architecture that produces the 38% follow-through rate.

Stage 1: Context Assembly. Before the founder sees a single response, Atlas pulls their full context layer: company stage, current revenue, burn rate, team composition, recent decisions (and their outcomes), active goals, and — critically — their decision history. How often do they ask for advice and not act? Which domains do they struggle with most? What's their typical decision latency?

This isn't just personalization. It's diagnostic. Atlas uses the context layer to classify the coaching need before choosing a response strategy.

Stage 2: Need Classification. Every founder question maps to one of five coaching modes:

CLARITY: Founder has information but can't see the pattern. Strategy: reflect their data back in a different frame.
CONFIDENCE: Founder knows the answer but won't commit. Strategy: reduce the decision to its smallest testable version.
CAPABILITY: Founder genuinely lacks a skill or framework. Strategy: teach the minimum viable concept, then assign practice.
ACCOUNTABILITY: Founder has decided but isn't executing. Strategy: create a micro-commitment with a specific deadline.
EXPLORATION: Founder is in genuine discovery mode. Strategy: expand the possibility space, then narrow with constraints.

Most AI coaching tools treat every question as CAPABILITY — "the founder needs information." Our data shows that only 23% of coaching questions are actually capability gaps. The other 77% are confidence, clarity, or accountability issues dressed up as information requests.

Stage 3: Constraint Selection. Based on the classified need, Atlas selects a set of response constraints. This is the counterintuitive part: we LIMIT what the AI can say.

For CONFIDENCE sessions, Atlas is constrained to ask a maximum of two questions and then make a direct recommendation in three sentences or fewer. No hedging. No "it depends." No "there are several factors to consider." A clear recommendation, stated with conviction, followed by "What's the smallest version of this you can test this week?"

For CLARITY sessions, Atlas is constrained to respond ONLY with questions. No answers. No frameworks. No data. Just questions designed to surface the assumption the founder hasn't examined.

For ACCOUNTABILITY sessions, Atlas is constrained to ignore the content of the question entirely and instead ask: "What did you commit to doing last time we talked? Did you do it? If not, what got in the way?"

These constraints feel limiting from an engineering perspective. They're the entire reason the system works from a coaching perspective.

Stage 4: Response Generation. The constrained prompt goes to the base model (Claude 3.5 Sonnet). The model generates a response within the constraint boundaries. Because the constraints are tight, the response is focused. Because the context layer is deep, the response is relevant. The combination of tight constraints and deep context produces responses that feel like they came from a coach who's known you for months.

Stage 5: Action Extraction. After every response, Atlas identifies the implicit or explicit action item and extracts it into a structured commitment: what the founder will do, by when, and what success looks like. If there's no action item, Atlas asks for one. This stage exists because our data showed that sessions ending without a concrete commitment had a 4% follow-through rate. Sessions ending with a specific commitment had a 41% rate.

Stage 6: Follow-Through Tracking. 48 hours after the commitment, Atlas checks in. Not with a generic reminder — with a contextual check-in that references the specific commitment and asks for the specific outcome. "You said you'd send the pricing email to 20 prospects by Wednesday. It's Thursday. How did it go?"

This loop — context, classification, constraint, generation, action, follow-up — is what turns an AI chat into a coaching system. Remove any single stage and the follow-through rate drops by 30-50%.

The Data Nobody Talks About: When AI Coaching Fails

Transparency matters, so here's what doesn't work.

Emotional processing. When a founder is dealing with cofounder conflict, burnout, or the psychological weight of a failing business, Atlas performs poorly. The system can identify emotional distress with reasonable accuracy (we detect it in about 70% of cases where founders later self-reported it), but the coaching constraints that work for decision-making feel cold and mechanical for emotional support. We route these sessions to human coaches on the MentorMe platform. AI coaching is not therapy and shouldn't pretend to be.

Novel strategic situations. When a founder faces a genuinely unprecedented strategic situation — a category-creating product, a regulatory environment that's never been navigated, a market that doesn't have analogs — Atlas's coaching quality drops significantly. The constraint architecture works because most founder decisions follow recognizable patterns. When the pattern doesn't exist, the constraints become arbitrary rather than helpful.

Late-stage complexity. Atlas works best for solo founders and teams under 10. As companies grow past 15-20 people, the decision complexity involves organizational dynamics, political considerations, and multi-stakeholder tradeoffs that the context layer can't fully capture. We're building toward this, but we're honest that we're not there yet.

Repeat non-actors. About 15% of founders in our study showed a consistent pattern: they engaged enthusiastically with coaching sessions, agreed to action items, and never followed through — session after session. After the fifth cycle of this, Atlas's follow-through interventions had zero marginal impact. Some people use coaching as a substitute for action, and no architecture can fix that.

What This Means for Founders

If you're a founder using AI tools for advice today, here's the actionable takeaway.

Stop asking AI for comprehensive analysis. The more thorough the AI's answer, the less likely you are to act on it. Instead, ask for the ONE thing you should do first. Not the five things. Not the framework. The one thing.

Force the AI to be direct. If you're using Claude or GPT for business decisions, add this to your prompt: "Give me a direct recommendation in two sentences. No caveats. No 'it depends.' Then tell me the one assumption that could make this recommendation wrong." This mimics the constraint architecture that produces results.

Track your own follow-through. Keep a simple log: what did the AI recommend, did you do it, what happened. If your implementation rate is below 20%, the problem isn't the AI's advice quality. It's your decision process. You're probably asking for information when you actually need accountability.

Choose coaching over consulting. When you interact with an AI, decide in advance: am I looking for INFORMATION (consulting mode) or am I looking for DECISION SUPPORT (coaching mode)? If it's coaching, explicitly tell the AI to ask you questions instead of giving you answers. "Don't tell me what to do. Ask me the question I'm avoiding." You'll be surprised how effective this is.

What's Next

We're publishing the full Atlas v3 architecture specification as an open reference. Not because we're worried about competitors — the constraint design is straightforward; the context layer and training data are the defensible moat. We're publishing it because the AI coaching space is full of products that are just ChatGPT with a different skin. That's not coaching. That's a chatbot with a persona.

Founders deserve better. The research clearly shows that constrained, context-rich AI coaching produces measurably better outcomes than generic AI conversation. If you're building in this space, build real coaching architecture. If you're a founder looking for AI-assisted decision support, demand evidence of outcome tracking, not just response quality.

The 10,247 sessions we ran changed how we build MentorMe. The data was clear: less advice, more accountability. Fewer options, more constraints. Shorter responses, higher follow-through.

If your team is running similar experiments with AI coaching, advisory, or decision support systems, we'd love to compare notes. Reach out at hello@mentorme.com or find us in the MentorMe community.

This is what coaching is supposed to look like in 2026. Not smarter answers. Smarter architecture around the answers.

Keep building with MentorMe

Ready to turn this into action? Start here:

AI-augmented business coaching — a real strategist plus an AI team that executes
small-group founder coaching — climb alongside other founders solving the same problems
MentorMe for coaches — built for your situation, not a generic playbook

Compare MentorMe

vs Clarity.fm vs GrowthMentor vs MentorCruise vs BetterUp AI Mentor for SaaS Founders Fractional CMO for Founders AI Mentor for Solopreneurs

Keep building with MentorMe

Related reading

Compare MentorMe