From Pilot to Performance: Adopting AI as one of the team
- November 8, 2025
- Posted by: Justin Prince
- Categories: Applied Technology, Human Resources and Industrial Relations
A simple, human‑centric roadmap to make AI actually work
People × Process × Technology, harmonised the LevelUp way.
Meet the AI Geek (close cousin on the CPU side to the IR Geek)
The AI Geek shares the IR Geek’s allergy to theatre and love of results—same dry humour, less patience for window dressing. They’re the person who quietly says, “Don’t expect a 10× return from a 10‑second setup,” draws a neat box on the whiteboard, and asks, “What job are we actually hiring AI to do?” From there it’s about turning demos into outcomes: clear roles, real context, and a weekly coaching rhythm.
Measure twice, pour once.
A 30‑minute brief beats a 30‑second one—like giving an artist time to create something worth framing.
A short story leaders will recognise
Scene 1 — Kick‑off (Monday, 9:00 a.m.)
The leadership team launches a discovery/analysis/pilot with fanfare. Budgets are promised, sleeves rolled, and someone quotes The Mandalorian: “This is the way.” There’s a deck, a town hall, and a handful of quick demos that look impressive.
Scene 2 — Phase One Done (Week 6)
The pilot wraps. The team learned a lot. Some steps sped up. Edge cases surfaced. “What good looks like” is clearer. The obvious next steps: write the AI Role Charter, build the Context Pack, set Green/Amber/Red decision rights, and wire the workflow so answers land where people work.
Scene 3 — The Stall
Momentum evaporates. Costs get trimmed, time is “reallocated,” commitment wobbles. No one can quite say why—pausing simply feels safer. The pilot slides into the museum of interesting experiments.
Scene 4 — Enter the AI Geek
No scolding—just translation: “We did the exciting bit and skipped the operational bit. We tried to install a model in yesterday’s process. Let’s finish the job.” The AI Geek circles four words—Purpose, Structure, Context, Feedback—and maps the next four weeks so the work sticks.
The four moves (how leaders make AI perform)
1) Write the role (Purpose).
Treat AI like a new hire. A one‑page AI Role Charter spells out:
• Outcome (what changes for customers or cost)
• Constraints & out‑of‑scope
• Decision rights by risk: Green (autonomous), Amber (AI acts, humans spot‑check 1‑in‑N), Red (human approves first)
AI Geek aside: Prompts aren’t spells; they’re specs. If the spec is fuzzy, the output will be too.
2) Redesign the workflow (Structure).
Map the handoffs: AI → human → AI. Delete steps you don’t need. Define “done.” Instrument the flow so quality and rework are visible, not guessed.
• People: Give humans the better job—judgement, exceptions, relationships.
• Process: Stage gates by risk; service‑level targets for speed and accuracy.
• Technology: Connect to source‑of‑truth data the AI actually needs.
3) Feed it real context (Context).
Build a Context Pack: 10–20 canonical examples (good and bad), definitions, and links to authoritative docs. Version it like code. If it changes performance, it gets a version number.
4) Coach the system (Feedback).
Run a weekly AI retro: review a random sample of outputs, agree what “good” looks like, and update the charter, examples, and tests. Publish a simple scoreboard (accuracy, rework, cycle time). Confidence ≠ competence—feedback turns one into the other.
The first month (a leader’s view)
Week 1 – Choose & Charter
□ Pick two narrow, high‑volume tasks
□ Baseline today’s speed and quality
□ Draft the AI Role Charter with clear decision rights (Green/Amber/Red)
□ Reminder: Measure twice, pour once — invest in setup (clarity, data, guardrails) to avoid 3× clean‑up
Week 2 – Build the Context Pack
□ Curate 10–20 examples (good and bad)
□ Wire data access to the source of truth
□ Create a small evaluation set to run on changes
Week 3 – Pilot with Guardrails
□ Roll to a small squad
□ Track quality and rework daily
□ Hold your first weekly retro
□ Delete a step the data says you can
Week 4 – Publish & Govern
□ Share the charter, controls, and results
□ Tie to a lightweight standard (seatbelt, not parade)
□ Plan the next two use‑cases
AI Geek aside: Don’t scale a pilot. Scale a way of working.
What to remember (and repeat)
• Hire your AI like a person. Write the role before you “turn it on.”
• Seatbelts beat posters. Governance should make the right way the easy way.
• Metrics, not vibes. Track accuracy, rework, cycle time—and translate wins into days back to customers.
• Entropy is undefeated. If nobody owns examples and tests, quality decays. Give someone that job.
The one‑page tools (fill these in)
AI Role Charter (canvas)
• Mission:
• Scope / Out‑of‑scope:
• Inputs & source of truth:
• Decision rights (GAR): Green | Amber | Red
• Quality bar & metrics: accuracy %, rework %, cycle‑time SLO
• Escalation triggers: uncertainty, potential harm, missing data
• Owners & cadence: who reviews what, how often; versioning rules
Context Pack (starter canvas)
• Examples (good/bad):
• Definitions / glossary:
• Authoritative docs & data links:
• Known edge cases + gold answers:
• Change log (what/when/why):
Scoreboard (publish weekly)
• Accuracy vs. gold examples:
• Rework rate (Top 3 causes):
• Cycle time vs. last week:
• Next improvement in flight:
The executive test (five decisions only you can make)
— Which two workflows go first—and why? (Pick where volume and variance meet.)
— Who owns the Context Pack? (If everyone owns it, no one does.)
— What’s your risk posture by tier? (Green/Amber/Red—with examples.)
— What one metric will you celebrate publicly? (Make the benefit tangible.)
— What will you stop doing because the new way is better? (Signal matters.)
Closing from the AI Geek
Measure twice, pour once. Invest a little more up‑front and you’ll get the result you actually need—like giving an artist 30 minutes instead of 30 seconds. Treat AI like a teammate: write the role, feed it context, coach it weekly, and set fair rules. That’s how you turn pilots into performance.
Your first step: Pick one process this month. Hire AI into it with a Role Charter. Build the Context Pack. Run the weekly retro. Publish the scoreboard. Then do it again.
#AI #Leadership #Operations #PeopleProcessTechnology #LevelUp #AIGeek
Appendix: Evidence & further reading
Productivity & workflow redesign
Generative AI at Work (QJE 2025, open access) — https://academic.oup.com/qje/article-abstract/140/2/889/7990658 — Why it matters: Large field study: ~14–15% lift in support agent productivity; biggest gains for novices.
Generative AI at Work (NBER working paper) — https://www.nber.org/papers/w31161 — Why it matters: Working paper version with full methods and heterogeneity results.
GitHub Copilot controlled experiment (arXiv) — https://arxiv.org/abs/2302.06590 — Why it matters: Developers completed a coding task ~55.8% faster with Copilot.
GitHub resource: Measuring Copilot impact — https://resources.github.com/learn/pathways/copilot/essentials/measuring-the-impact-of-github-copilot/ — Why it matters: Exec-friendly summary of multiple studies and metrics.
Navigating the Jagged Technological Frontier (HBS/BCG PDF) — https://www.hbs.edu/ris/Publication%20Files/24-013_d9b45b68-9e74-42d6-a1c6-c72fb70c7282.pdf — Why it matters: Consultant field experiment; strong gains inside AI’s strength zone; risks at the frontier.
Adoption, culture & leadership
MIT SMR × BCG — Learning to Manage Uncertainty, With AI (overview) — https://sloanreview.mit.edu/projects/learning-to-manage-uncertainty-with-ai/ — Why it matters: Organizations that combine organizational learning with AI learning perform better under uncertainty.
MIT SMR × BCG — Full report (PDF) — https://web-assets.bcg.com/c1/a7/af0e57dc4b47a31eb7409d981d3e/mitsmr-bcg-ai-report-november-2024.pdf
HBR — Your Organization Isn’t Designed to Work with GenAI — https://hbr.org/2024/02/your-organization-isnt-designed-to-work-with-genai — Why it matters: Treat GenAI as an assistive agent; redesign the work.
HBR — Stop Tinkering with AI — https://hbr.org/2023/01/stop-tinkering-with-ai — Why it matters: Pilots don’t create value unless they scale into redesigned work.
HBR — The Gen AI Playbook for Organizations — https://hbr.org/2025/11/the-gen-ai-playbook-for-organizations — Why it matters: Current, exec-friendly guidance on what leaders should ask and do now.
Governance & standards (seatbelts, not parades)
NIST AI Risk Management Framework — Overview — https://www.nist.gov/itl/ai-risk-management-framework — Why it matters: Four functions: Govern, Map, Measure, Manage.
NIST AI RMF Playbook — https://www.nist.gov/itl/ai-risk-management-framework/nist-ai-rmf-playbook — Why it matters: Practical suggestions mapped to each function.
ISO/IEC 42001:2023 — AI management systems (ISO listing) — https://www.iso.org/standard/42001 — Why it matters: The ISO 27001‑style management‑system standard for AI.
Standards Australia — AS ISO/IEC 42001:2023 — https://www.standards.org.au/blog/spotlight-on-as-iso-iec-42001-2023 — Why it matters: Australian context and adoption note.
EU AI Act — Implementation timeline (official overview) — https://www.europarl.europa.eu/RegData/etudes/ATAG/2025/772906/EPRS_ATA%282025%29772906_EN.pdf — Why it matters: EU timeline and phases to plan your reviews.
Macro context
Stanford HAI — AI Index 2025 (PDF) — https://hai.stanford.edu/assets/files/hai_ai_index_report_2025.pdf — Why it matters: Neutral synthesis for boards; adoption, economics, policy.
LevelUp references
LevelUp — Home — https://lvlup.au/ — Why it matters: People × Process × Technology in practice.
LevelUp — How we work — https://lvlup.au/how-we-work/
LevelUp — Applied Technology — https://lvlup.au/how-we-work/innovative-technology-solutions-work-systems-levelup/