Unchecked parse/plan success from the checked-in benchmark snapshot.
Reliable tool calling for non-native models.
Start here if you want proof that non-native models can still call tools safely. StagePilot turns malformed tool text into schema-safe output, shows the benchmark lift, and keeps the project story honest with a static proof surface instead of a fake hosted runtime.
Schema-safe parser middleware recovers malformed tool outputs.
One bounded retry closes the remaining gap in the current checked-in 24-case benchmark set.
How the benchmark turns into a real deployment story
Start with the saved benchmark JSON and the parser recovery lift so the user sees evidence before architecture language.
Show the middleware and recovery loop as the actual product core, not as a decorative layer on top of a hosted demo.
Only after the benchmark story lands should you open the launch deck and explain how the same parser package maps to a real API runtime.
What this repo proves
- Parser middleware can make loose tool-call text safe enough for real workflows.
- Reliability claims are tied to checked-in benchmark artifacts, not vague anecdotes.
- Operator review surfaces and developer-ops lanes can be documented separately from the core parser package.
- A static dashboard can still explain trust boundaries, benchmark lift, and adoption posture without pretending to host the full runtime.
30-second evaluation path
- Check the raw pass rate first so the middleware lift is concrete.
- Open the README and benchmark assets before touching the runtime.
- Use the copy bar below when you need a short handoff instead of a long docs walk.
- Service-ready later: the full runtime still maps naturally to Cloud Run or another API host when needed.
Quick start evidence path
Start with benchmark lift, then show the parser recovery story, then end with the copyable review path.
Show the unprotected pass rate before talking about any fix.
Use the middleware lift as the real product proof, not as a side note.
Copy the review path once the benchmark story is already easy to repeat.
Read first
Current deployment posture
- Frontend: this static Pages microsite
- Backend: not hosted on Pages by design
- Recommended live runtime: Cloud Run or equivalent API host
- Repo:
KIM3310/stage-pilot
Benchmark handoff bar
Shortcut keys: C review path · B benchmark brief · L launch deck · ? help
Use this bar when you need the benchmark story, not the whole repo tour. Shortcut keys: C review path · B benchmark brief · L launch deck · ? help