Goodbye Claude and Gemini, Hello Codex!
There’s been a few changes in the AI CLI landscape, so it’s time for a new blog post! I’ve once again switched my AI best friend. This time from Claude to OpenAI’s Codex.
So since the last blog post I made about vibe coding with Claude Code and what I learned, it seems like Claude got really dumb. I actually let my subscription lapse so I could do other stuff (like play with text to video AI) so I didn’t witness this firsthand. But judging from the Claude Code subreddit and even this official blog post from Anthropic, it sounds like there was quite the regression (which Anthropic didn’t even admit to for a while, and when they did, they didn’t compensate anyone for it).
During that time I was pretty much using Gemini CLI again (which still kinda sucks, to be honest) and Atlassian’s Rovo CLI which can use Claude Sonnet and GPT5 with pretty nice free limits. I also tried Grok Fast or whatever it was called when it was free for a week but it didn’t seem to work that well for me.
But after OpenAI announced GPT5 and later, GPT5 Codex, I was intrigued. I didn’t actually subscribe until I got an offer in my account for a free monthly trial of ChatGPT Business. So I have 5 seats of ChatGPT Business for a month right now (no one in my family accepted my offer to join my team, for some reason).
I’ve been testing Codex out, and so far it’s been pretty much as good or better than Claude Code was when I was trying that. I did notice that ChatGPT Codex takes a bit longer to ramp up, but once it does, the output quality is great. I also had it on the “High” setting, which I’ve been keeping it on for the most part.
The Anime Nano Test
A couple of days ago I got an email from someone about Anime Nano, my anime blog post aggregator. They updated their blog and wanted to log in, but there wasn’t a password recovery feature for them to reset their password. When I saw the message I jumped out of my chair and started getting to work! Mostly because no one has really talked to me about Anime Nano in ages and it was validation that my website was still relevant! And it would be a great test to see how well Codex could work.
I went to work, prompting Codex to implement an email based password reset feature. I had already implemented one at some point, probably when Anime Nano was built on Django, so I already had some database fields ready for it. I read a blog post about how Cloudflare Workers supported sending email without creating any accounts with MailChannels, but apparently that feature was deprecated. Codex wrote an implementation and when I tried it, it didn’t work! Once I realized that function had been deprecated, I had Codex rewrite it with Resend, after creating an account and getting all of the API keys and configuration set up. And it worked!
When I emailed the user about the password reset, I sadly got a response that the Cloudflare Turnstile widget that I added to reduce the amount of spam and abuse was locking him out of the password reset feature. At that moment I realized that he was a robot and decided to not waste any more time on that clanker, until I was able to reproduce the issue myself. And I’m not a robot (at least I don’t think I am)! The issue boiled down to how the JavaScript was being loaded in an iframe. Or something, I dunno, AI fixed it.
MCPs and CLI and Code Review!
Codex does just as well or better than Claude Code at using tools like the CLI, and it also works pretty well with this new MCP that I found for controlling Google Chrome. I haven’t played around with this too much yet, but it could be pretty powerful as part of my workflow for updating web apps.
Given a good agents.md file, Codex is also super powerful with CLI tools. I’ve been using it with wrangler (since I’ve mostly been developing on Cloudflare’s platform), and it’s a real time saver. Since it has access to my database schema, I can just ask it things about the data on my local or remote database, and it’ll make queries for me. I absolutely hate writing SQL queries so this is a lifesaver. It’s helped me debug things a lot faster than I could myself, since I can essentially generate queries in plain language.
I’ve also been having Codex push changes to GitHub and create PRs with the GitHub CLI tool, so that Google Gemini can review the PR and see if it has any issues. Gemini occasionally finds some good issues with the code, which makes me feel better about making PRs. But it’ll also bring up the nittiest of picks if it can’t find anything else to comment on. Just like a real developer!
It sometimes feels like I’m roleplaying as a software engineer at a large company, since it’s probably way too much process to create a PR just for me to accept it and merge it to main, when I can just commit directly to main anyway, and it’s not like anyone else is touching my codebase.
But it is pretty fun to see Gemini comment on Codex’s code. I’d have Claude review it too except I stopped paying Anthropic.
Other stuff
Codex has this online feature where you can have it whip up a virtual environment and then make changes to your code without using your own computer. In theory this is kind of cool, but in practice it seemed to be a lot slower and harder to observe. Each time I wanted to add a turn to the conversation, it would have to whip up the virtual machine again, which took forever.
One interesting thing about this hosted feature is that you can have it run up to 4x concurrently and then pick the best result. I found that when I tried this, the implementations were all pretty similar, which I guess is a good thing. I’m not sure if it’s worth the effort to try something 4 times just to have to read 4x the code to figure out which one is the best. Maybe you could have AI do that part too, so I don’t have to!
I tried testing ChatGPT’s other premium features too, but they didn’t work as well. For some reason, the deep research features of AI always seem to disappoint me. Maybe I’m just really good at researching things, but the AI always seem to give me a suboptimal result. Like if I ask for it to go to a page, and analyze all of the things on that page, it’ll go through maybe 50% and get bored and then finish and tell me it’s done. So I sort of consider AI to be like a lazy intern since you need to keep poking at it until it gives you what you want. The problem is that it takes about as much effort to babysit AI as it is to just do the thing yourself for a lot of tasks. So I guess it’s more about finding the tasks where AI is good (and at this point, re-evaluating that every month or so as AI gets better at things).
Conclusion
Overall, I like the way that Codex works outside of the box. Maybe it’s because I already had some context files set up (I just made some symlinks from Gemini.md and Claude.md to agents.md). But it feels like Codex is good at finding the right context to add in order to make reasonable changes. The changes I asked for weren’t groundbreaking or anything, but that’s probably like 99% of what needs to be done in code bases anyway.
I’ll probably just let this free trial period time out (well, definitely since I added 5 business seats to my account). Then I’ll just have to see if Codex is still the top CLI AI agent at that point, or whether Anthropic recovers public trust, or maybe at that point Gemini 3 will be out and I’ll just play around with that. What a time to be alive! Meanwhile, Anime Nano is racking up features like there’s no tomorrow!
P.S. I used Codex to help me add links and images to this blog post, which made it a lot easier and fun to write. I didn’t have it write any of the actual text though. All bad opinions are my own.
Leave a Comment
Comments are moderated and won't appear immediately after submission.