
Foxl v0.2.19 lands a feature that sits right in the sweet spot of "what ChatGPT Plus users actually want from a desktop agent": generating and editing images with gpt-image-2 through your existing subscription. No OpenAI API key, no per-image billing, nothing new to pay for. If you already give OpenAI $20/month, your desktop agent can now make pictures out of that same budget.
The short version
- New
generate_image tool, backed by gpt-image-2 over Codex Responses via ChatGPT OAuth. Auto-registered when ~/.codex/auth.json is present, invisible otherwise. - GPT-5.5 support added — 1M context, reasoning, streaming. GPT-5.4 also bumped to 1M.
- Fixed a stale namespace alias that was silently rerouting
openai-oauth/gpt-5.4 to gpt-4.1, which the ChatGPT OAuth endpoint refuses, producing the confusing "Language model stream error: Bad Request" with no further detail. - Strands SDK bumped to
v1.0.0-rc.5 (mid-execution cancellation, agent-as-tool, invocation-lock-leak fix).
Why it's not a new API key
The OpenAI Developer Community announcement for gpt-image-2 talks about "the API and Codex." In practice, the ChatGPT Plus/Pro OAuth backend only serves a subset of the catalog to consumer accounts. Live probing the chatgpt.com/backend-api/codex/models endpoint with a Plus account today returns gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.3-codex, and gpt-5.2 — that's it. Asking Codex to stream model: "gpt-image-2" directly comes back with an explicit 400:
{"detail":"The 'gpt-image-2' model is not supported when using Codex with a ChatGPT account."}
Same story for gpt-5.4-codex, gpt-5.5-codex, gpt-5.5-mini, gpt-5.5-pro. They all exist at the platform level but are gated to API-key / Enterprise accounts. We excluded them from the catalog so users don't see "available" models that instantly fail.
But — and this is the useful part — gpt-image-2 is reachable through the built-in image_generation tool on a regular language model. The official Codex CLI uses exactly this shape under the hood. Foxl now does the same:
POST /backend-api/codex/responses
{
"model": "gpt-5.4",
"input": [{ "role": "user", "content": [{ "type": "input_text", "text": "<prompt>" }] }],
"tools": [{ "type": "image_generation", "model": "gpt-image-2", "size": "1024x1024" }],
"stream": true
}
The response streams back image_generation_call output items whose result field is base64 PNG. We decode it, write the full-resolution file to disk, and ship a resized preview through the agent's tool-result so the UI can render it immediately. Same ChatGPT account, same $20/month, no separate quota.
The token-efficient wire shape
Image input and output can get expensive fast if you treat them like text. A 1024×1024 PNG is ~1.4MB as base64. That's roughly 2 million tokens if it ever ends up in the model's context window. We don't let that happen.
Chat-attached images never cross the model's tool-call args
When you drag an image into the chat, the server persists it to data/workspace/attachments/chat-<timestamp>-<rand>.<ext> and silently appends the absolute path to your user message as [Attached image paths: "/abs/path.jpg"]. The model sees a short string of characters — not a megabyte of base64. When it wants to use the image as a reference, it passes that path string to generate_image({ inputImages: [...], prompt: "..." }). The tool itself reads the bytes from disk inside the server process and encodes them into the Codex request as input_image data URIs.
Net result: the expensive bytes ship directly from your disk to OpenAI's edge. The model's tool-call args stay small. Token bill stays small.
Output is a path plus a preview, not the full PNG
The tool returns a short block to the model:
Image generated
Saved to: /abs/path/generated-2026-04-24-cat-with-blue-eyes-4591.png
Size: 1382.4 KB (PNG)
data:image/jpeg;base64,<~100KB preview>
The full-resolution PNG lives on disk. The inline data URI is a sharp-resized 768px JPEG — about 100KB instead of 1.4MB. The UI uses that preview to render the image inline; the original stays untouched at outPath and is accessible via the Open Folder button in the image viewer.
The tool description also tells the model explicitly not to call view_image on its own output and not to describe the image back in prose. Both of those would burn tokens reading something the user can already see in the UI.
Editing and composing, not just generating
The tool's inputImages parameter accepts up to 10 file paths. One path switches gpt-image-2 into edit mode — "이 사진 수정해줘" applied to your attached image. Multiple paths trigger compose mode — "combine these into one scene." No additional API key required; the same ChatGPT OAuth session handles it.
Because the server auto-persists chat attachments and injects their paths into the user message, the loop works end-to-end without the user typing file paths. Drag a picture of your cat into the chat, say "make this a watercolor," and the model has everything it needs.
The silent GPT-5.4 bug
Probably the best bug-hunt of the release. Users reported that selecting openai-oauth/gpt-5.4 returned an opaque "Language model stream error: Bad Request" on the very first turn. GPT-5.4 works fine on the Codex account — we proved that with a raw HTTP probe. So something in Foxl was mangling the request.
Turned out to be a stale alias in shared/model-resolver.ts:
const NAMESPACE_ALIASES: Record<string, string> = {
// ...
'openai-oauth/gpt-5.4': 'gpt-4.1', // <-- this
'openai-oauth/gpt-5.4-codex': 'gpt-4.1', // <-- and this
};
The alias predated real GPT-5.4 availability on Codex; it was a temporary fallback meant to keep things running. Once setModel() called toCanonicalId(), it mapped the user's selection to gpt-4.1 — a model the ChatGPT OAuth endpoint doesn't serve — and the server rejected it. Strands wraps all upstream errors as "Language model stream error: {message}", but in this case Codex returned only a bare 400 with no body, so we got a uniquely useless error string.
Fix was a three-line deletion. The reason it took non-trivial time to find was that the debug hook we use to capture Codex's response body only exists behind FOXL_CODEX_DEBUG=1, which we'd left off. That flag is also new in this release: flip it on and any non-2xx response from chatgpt.com/backend-api/codex/* gets its request + response bodies logged verbatim so future cases like this take five minutes instead of ninety.
Strands SDK rc.5
We bumped @strands-agents/sdk from rc.3 to rc.5. No call-site changes needed — Foxl is already on the VercelModel path for the OAuth providers. The interesting bits we inherit:
- Mid-execution cancellation via
agent.cancel() and cancellationSignal: AbortSignal.timeout(5000). The agent stops between tool calls, and running tools can forward the signal to fetch to abort in-flight requests. - Invocation-lock leak fix when a consumer breaks out of
for-await-of on agent.stream(). Previously the cleanup path hung forever and subsequent calls threw ConcurrentInvocationError. - Bedrock thinking + forced
tool_choice compatibility — the SDK now strips thinking when tool_choice forces tool use, instead of hitting a 400. - Context-window overflow detection synced with the Python SDK's patterns, so long-prompt autocompaction behaves consistently.
Smaller things worth mentioning
- Generated images now live under
data/workspace/generated/ with date-stamped, prompt-slug filenames (generated-2026-04-24-cat-with-blue-eyes-4591.png). Workspace root stays clean — just the seven curated memory files. - Workspace page has a new Open Folder button in the header that pops the workspace directory in Finder / Explorer /
xdg-open. The image viewer has one too, and reveals the PNG with the file pre-selected. - Removed two overlapping skills from the
foxl-ai/skills repo: openai-image-gen (needed OPENAI_API_KEY + a Python script) and nano-banana-pro (needed GEMINI_API_KEY + uv + a Python script). The native generate_image tool supersedes both, with no external deps. - Frontend TypeScript is now zero-error clean. We bumped the web
tsconfig target from ES2020 to ES2023 (unlocks findLastIndex, stricter ArrayBuffer typing, and import.meta.env) and fixed a cluster of minor narrowing casts. Nothing was blocking the Vite build, but green tsc makes future refactors safer.
What's next
gpt-image-2 is the first ChatGPT-subscription-powered capability in Foxl beyond text. The obvious follow-ups are audio (Whisper via the same OAuth route) and video (Sora, when it lands on the consumer endpoint). The image groundwork — path-based tool input, disk-persisted attachments, preview-on-disk output, UI that renders inline without re-reading — generalizes cleanly to both. When the endpoints open up, the plumbing is already there.
For now: install v0.2.19, attach a photo, say "make this a Studio Ghibli illustration," and watch your subscription do something new for you.