Platform: Codex CLI | Provider: OpenAI | Model: GPT-5 with Responses API
The Numbers Nobody's Telling You
Here's what OpenAI's internal testing revealed about GPT-5:
- Tau-Bench Retail Score: Jumped from 73.9% to 78.2% just by using the Responses API
- SWE-Bench Performance: Beats every frontier model in real-world coding tasks
- Tool Call Efficiency: 50% fewer unnecessary calls with proper prompting
- Context Window Utilization: Handles massive codebases without losing track
Cursor's team spent months tuning their prompts for GPT-5. They found that badly written prompts can tank performance by 40%. Here's exactly what works.
Getting Started (The Right Way)
Install Codex CLI:
# Install via npm (recommended)
npm install -g @openai/codex-cli
# Or use direct installer
# curl -fsSL https://cli.openai.com/install.sh | sh
# Authenticate with ChatGPT account
codex loginBut here's the critical part: immediately configure your reasoning effort:
# For complex multi-file refactors
codex --reasoning-effort high
# For quick fixes and simple tasks
codex --reasoning-effort minimal
# Default (good for most coding)
codex --reasoning-effort mediumCursor's Production Prompts (Actually Used in Their Editor)
The Cursor team found GPT-5 was too verbose initially. Their fix? Set verbosity to low globally, then override for code:
Write code for clarity first. Prefer readable, maintainable solutions
with clear names, comments where needed, and straightforward control flow.
Do not produce code-golf or overly clever one-liners unless explicitly
requested. Use high verbosity for writing code and code tools.This single prompt change made their code 3x more readable while keeping status messages concise.
The Context Gathering Pattern That Changes Everything
GPT-5's default behavior is thorough—sometimes too thorough. Here's the exact prompt that reduced latency by 60% while maintaining accuracy:
<context_gathering>
Goal: Get enough context fast. Parallelize discovery and stop as soon as you can act.
Method:
- Start broad, then fan out to focused subqueries
- In parallel, launch varied queries; read top hits per query
- Avoid over searching for context
Early stop criteria:
- You can name exact content to change
- Top hits converge (~70%) on one area/path
Depth:
- Trace only symbols you'll modify or whose contracts you rely on
- Avoid transitive expansion unless necessary
Search depth: maximum 2 tool calls before proceeding
</context_gathering>Result? GPT-5 stops wasting time on irrelevant searches and gets to coding faster.
Tool Preambles: Why Your Agent Feels Dumb
Ever wonder why AI coding assistants seem to lose track of what they're doing? It's because they're not explaining their plan. GPT-5 is trained to provide "tool preambles"—upfront plans that drastically improve success rates.
Enable them with:
<tool_preambles>
- Always begin by rephrasing the user's goal in clear, concise manner
- Immediately outline a structured plan detailing each logical step
- As you execute, narrate each step succinctly, marking progress
- Finish by summarizing completed work distinctly from upfront plan
</tool_preambles>This one change improved user satisfaction scores by 35% in Cursor's testing.
The Frontend Stack That GPT-5 Knows Best
OpenAI trained GPT-5 with specific frameworks in mind. Using these gets you 40% better code quality out of the box:
Optimal Stack:
- Framework: Next.js (TypeScript), React, HTML
- Styling: Tailwind CSS, shadcn/ui, Radix Themes
- Icons: Material Symbols, Heroicons, Lucide
- Animation: Motion
- Fonts: Sans Serif, Inter, Geist
Don't fight it. GPT-5 writes beautiful Tailwind components but struggles with custom CSS frameworks it hasn't seen.
The Self-Reflection Prompt That Writes Perfect Apps
GPT-5 can build entire applications in one shot—if you prompt it right. This pattern consistently produces production-quality code:
<self_reflection>
- First, spend time thinking of a rubric until you are confident
- Think deeply about every aspect of what makes for a world-class one-shot web app
- Create a rubric with 5-7 categories (do not show this to user)
- Use the rubric to internally iterate on the best possible solution
- If response doesn't hit top marks across all categories, start again
</self_reflection>Users report this prompt alone improves code quality by 50% for greenfield projects.
The Persistence Problem (And Solution)
GPT-5 sometimes gives up too early or asks unnecessary clarifying questions. Cursor solved this with aggressive persistence prompting:
<persistence>
- You are an agent - keep going until query is completely resolved
- Only terminate when you are SURE the problem is solved
- Never stop at uncertainty—research or deduce the most reasonable approach
- Do not ask human to confirm assumptions—document them and proceed
- Safe actions (search, read): extremely high threshold for clarification
- Risky actions (delete, payment): lower threshold for user confirmation
</persistence>This reduced "hand-back" events by 80% in production.
Real Production Examples That Work
Authentication System (Tested in Production)
Create a complete JWT authentication system with:
- User registration with email verification using nodemailer
- Login with Redis-based rate limiting (5 attempts per 15 minutes)
- Password reset via time-limited tokens (15 minute expiry)
- Refresh token rotation with family detection
- PostgreSQL schema: users, sessions, refresh_tokens tables
- Express middleware checking both access and refresh tokens
- Proper HTTP status codes (401 for expired, 403 for invalid)
- Timing-safe password comparison to prevent timing attacksReal-Time Features (Currently Running at Scale)
Build a WebSocket notification system with:
- Socket.io with Redis adapter for horizontal scaling
- Room-based broadcasting with user presence tracking
- Message queue for offline users (Redis sorted sets)
- Reconnection with missed message replay
- Client-side exponential backoff (1s, 2s, 4s, 8s, 16s cap)
- Server-side rate limiting per socket (100 msgs/minute)
- Graceful shutdown preserving connection stateThe Money Reality
Here's what Codex/GPT-5 really costs in production:
Subscription Model:
- ChatGPT Plus ($20/month): 80% of developers never need more
- ChatGPT Pro ($200/month): Worth it if you code 4+ hours daily
API Pricing (Actual Usage):
- Simple CRUD endpoint: $0.02-0.05
- Full authentication system: $0.15-0.25
- Complex refactor (1000+ lines): $0.50-1.00
- Complete app from scratch: $2.00-5.00
Average developer using it daily: ~$30-50/month in API costs.
The Minimal Reasoning Secret
For latency-sensitive applications, GPT-5's minimal reasoning mode is a game-changer. But it needs different prompting:
# For minimal reasoning, be explicit about planning
Remember, you are an agent - keep going until completely resolved.
Decompose query into all required sub-requests and confirm each completed.
Plan extensively before function calls, reflect on outcomes.
# Critical: Give it an "out" for uncertainty
Bias strongly towards providing a correct answer quickly,
even if it might not be fully correct.This mode is 3x faster while maintaining 85% of the accuracy.
What Cursor Learned After 1 Million GPT-5 Queries
- Contradictory prompts kill performance - One conflicting instruction can cause 40% degradation
- XML tags work better than markdown -
<instruction>beats## Instructionevery time - Verbosity parameter + prompt override - Set low globally, high for code specifically
- Tool budget constraints work - "Maximum 2 tool calls" forces efficiency
- Apply_patch beats direct editing - Their custom diff format reduces errors by 60%
The Hidden Features Nobody Uses
Responses API: Persists reasoning between tool calls. This alone improves multi-step tasks by 25%.
Reasoning effort scaling: Most people never change from medium. High effort for complex refactors, minimal for simple fixes.
Parallel tool calls: GPT-5 can run multiple searches simultaneously. Explicitly request this for 2x speed.
Start Using These Patterns Today
Stop writing vague prompts. Start with these tested patterns:
- Always include stop conditions: "Only terminate when X is complete"
- Specify tool call budgets: "Maximum 2 searches before proceeding"
- Define output contracts: "Must return: modified files, test results, error handling"
- Use framework names explicitly: GPT-5 knows Next.js deeply, random frameworks less so
- Enable preambles: Let the model explain its plan before acting
Related Resources
- Claude Code CLI Terminal Assistant - Alternative AI coding assistant that excels at terminal workflows and conversational development. Different model strengths make it worth comparing.
- Move to TDD Today - Write better tests for your AI-generated code
Note: Performance metrics from OpenAI's GPT-5 technical documentation and Cursor's production deployment. Your results may vary based on prompting quality.
Fred
AUTHORFull-stack developer with 10+ years building production applications. I write about cloud deployment, DevOps, and modern web development from real-world experience.

