How I use AI agents in 2026

Feb 12, 2026

12 min

geek stuff

It’s February 2026, and I’m writing this with a clear understanding that in six months this post may become a museum exhibit. The field of AI-assisted development is changing so rapidly that any observations go stale faster than milk in the fridge. Still, documenting the current state of affairs seems worthwhile, if only so I can laugh at my own delusions later.

I’ve walked the entire path that most developers go through these days: from code autocompletion in the editor (Copilot and the like), through chat interfaces with copy-pasting code back and forth, then the agentic approach (Claude Code), and finally to full automation with Ralphex. Each stage felt like a breakthrough, and the previous one felt naive in hindsight. I suspect that in six months the current approach will look the same way.

copilot

My journey in AI-assisted development started, like most people’s, with Copilot. Code autocompletion right in the editor felt like magic — sometimes it would amazingly guess exactly what you wanted to write and complete entire blocks for you. In practice, of course, it didn’t always hit the mark, but the feeling of the future was already there. The approach is quite local and tactical, but credit where it’s due — even in today’s reality Copilot remains a very useful thing. Essentially, it has become a modern autocomplete with a certain intelligence, and it works great in that capacity.

chat interfaces

After Copilot stopped being a novelty, models appeared that were good enough for programming. If I’m not mistaken, the first truly suitable model was Sonnet 3.5, though I might be wrong about that. One way or another, they became good enough to use as a programming assistant, but in a mode that was, frankly speaking, brutal.

Here’s how it looked for me: you copy a chunk of code into a chat, hoping the context will be sufficient for the model to understand what’s going on. You can’t paste the entire project in there — it won’t fit. You try to discuss the code, solve various problems. Then you copy the result back into your editor, check how correct everything is. If something’s off, you copy it back into the chat, and so on until you reach a state of complete love and understanding.

Back in those days, when I shared my experience using these chat interfaces, I expressed certain doubts about the general chorus of joy claiming a twofold improvement in programming speed. I cautiously argued that the gain was more in the 10-20% range, and looking ahead, I believe I wasn’t far from the truth. It did solve real problems, but had plenty of complexities, and from the vantage point of today’s reality this approach looks, to put it mildly, far from optimal.

early attempts at the agentic approach

Between chat interfaces and full-fledged agents, there was a short series of experiments with programs that tried to implement something resembling the agentic approach. Cursor, Aider, Cody and others all tried to varying degrees to give AI access to the entire project and let it make changes on its own. I used them for a while, but in practice none of these tools really clicked for me, because something was always missing: either the context got lost, or the results were too unpredictable, or the workflow was inconvenient. Nevertheless, the direction was right, and it was precisely these tools that paved the way for what came next.

agents

The next stage, once again just like everyone else — my story is no different from anyone else’s — started in February-March 2025 with the release of Claude Code, i.e. a fully agentic approach to programming tasks. And it was immediately “wow”. A “wow” of the same magnitude as when Copilot first appeared. It was perfectly clear that this was the way forward, even though the road wasn’t without bumps. Over time, models got better, and I, like you, learned to use them more effectively.

The efficiency gains here are obviously no longer the 10-20% we started with. But I would strongly argue against the wildly optimistic claims of tenfold improvements and other miracles. It is what it is — this approach undeniably changes the entire process and shifts the focus from writing code to preparing and planning it. But it doesn’t eliminate the need for human involvement, and I’d say that in real life we’re talking about a 2-3x improvement, not 10x. Nevertheless, this is already a completely different level, and it’s genuinely worth getting used to and learning to use.

the agentic approach and manual mode

My typical workflow with Claude Code looked roughly like this: first, a detailed plan is created for the entire task, with subtask breakdowns, descriptions of expected outcomes, and test descriptions. Once the plan was complete and approved by me, the proud human, the machine would cheerfully start chugging along.

In manual mode it looked like this: I’d tell Claude Code “here’s the plan, let’s do task number one”. It would enter planning mode, make a local plan for the specific task, I’d confirm, it would execute. After that I’d run tests, sometimes ask another AI for a second opinion — in my case Codex — and repeat the whole process. Naturally, things didn’t always go smoothly. If Codex found errors, for example, I’d copy its responses back into Claude Code, and that’s how we’d arrive at a reasonably decent result. Sometimes the process repeated several times, and in the end it turned out pretty well.

The whole process is fairly mechanistic, and there’s no real point in having a human in the middle. I tried approaching automation in various ways. For instance, I set up hotkeys in Kitty that automatically transfer the review results from Codex into the adjacent window running Claude Code, and even press Enter for me. You know, minimal automation attempts that helped — less manual copying — but still half-measures.

On top of that, there was another catch with this approach. When I was doing this, Claude Code didn’t yet have automatic context cleanup when switching between tasks. You had to constantly remember where you were in the context. If you forgot to clear the context before starting a new task, you could end up triggering automatic compaction, which both takes a lot of time and doesn’t improve the quality of the result. And in general, when you’re working with a bloated context, everything is both worse and slower.

Now, in February 2026, things are simpler — Claude Code itself offers to switch from plan mode to execution mode and clear the context, and this addresses part of the problem. However, the mechanistic nature of the whole operation is surprising in an era when spaceships are roaming the universe and judgment day is just around the corner.

the ralph loop idea and the birth of ralphex

Right around this time, the Ralph Loop appeared. I won’t claim it was my idea, but it had been bouncing around in my head too. I don’t remember exactly who the author is — google it or ask your favorite AI. The idea is simple: instead of running one agent in a loop to execute all tasks, you spin up a fresh, separate agent for each task. And you keep going until all tasks are complete. At least, that’s my simplified understanding, though the original idea sounds roughly just as simple.

Armed with this idea plus the experience of manual mechanistic labor, I finally decided to tackle this problem. By this point I already understood what kind of plan I needed and what level of interactivity I wanted, and what I wanted was very low interactivity — in fact the opposite, full automation. Ideally, I want to create a detailed, verified plan, feed it to the machine, and get a result after some time. What’s more, a result that has been reviewed multiple times by different AIs and corrected accordingly. Especially interesting is the mode where one agent reviews, hands off to another, the other fixes things and hands back to the first. In my case, Codex watches from the side and sends comments back to Claude, but Claude also checks things itself by launching reviews with separate sub-agents. And all of this is part of one fully automated process.

The first implementation of this advanced Ralph Loop was written in Python — a short little script of about 70-80 lines. It quickly became obvious that things were quite a bit more complex than they seemed at first. You want to keep logs, you want to inject knowledge from previous executions when starting a new iteration with a fresh context, you want to set limits so it doesn’t churn forever and eat up all your precious tokens, and much more. The script quickly grew to 300-400 lines, and it became clear that maintaining such code would be difficult.

I came to the conclusion that staying on Python wasn’t the right path for a project that promised to grow, so I rewrote it in Go. And that’s how Ralphex started its active life.

I should mention that I rewrote it in Go using itself. That old Python version served to rewrite itself into Go. There’s a certain self-irony in that, and even a somewhat philosophical note. This process kept repeating: I solved all tasks for Ralphex using Ralphex itself. Like a compiler that can compile itself, Ralphex can create and execute plans for itself.

how it works in practice

One of my goals was to make everything really simple. If I write yet another utility that covers every possible use case and requires fine-tuned configuration, then firstly, nobody besides me will ever use it, and secondly, in the era of constant context switching I myself won’t remember how to properly launch the thing a week later.

creating the plan

Creating a plan for Ralphex isn’t just simple, it’s dead simple, and there are several ways to do it. First, it comes with a skill for Claude Code that lets you create plans right from it. You explain the task, it asks clarifying questions, requests confirmations, and step by step the plan gets brought to completion. Second, Ralphex has its own built-in planning mode — I don’t use it often and prefer to work through Claude Code, but it does roughly the same thing.

That said, there are no special requirements for how exactly the plan gets created. The format is as simple as it gets — plain Markdown where tasks are marked with checkboxes: incomplete [ ], completed [x]. You can create such a plan in anything. There are plenty of ready-made skills for this on GitHub — any brainstorm agent, plan agent, and other similar tools. In short, nothing complicated. A straightforward, simple, and clear plan.

execution

In practice, it works like this: there’s a plan broken into tasks. You launch Ralphex, point it at the plan, and off you go. It starts executing tasks one after another, sequentially. This is where the hard part begins, and it’s hard in two ways. First, if the task is big, you have to wait — it’s going to take time. Second, you have to restrain yourself, because there’s nothing you can do anymore. The plan is written, approved, and you have to surrender to the flow. Once it’s launched, you can go have a smoke or prepare another plan and launch it in parallel.

Strictly speaking, there are ways to nudge the process one way or another during execution. But the core idea I was trying to implement is: created the plan, launched it, stepped away.

As for my complaints about it being “long and taking time”: the reason for the wait is that the tasks are genuinely complex. My typical task takes an hour to an hour and a half on average and contains 10 to 15 subtasks, each with 5 to 7 specific items. Doing this by hand in mechanistic mode, I’d probably spend three to four times as long. Before, when I did this manually, I could handle one or two such tasks per day, no more. Now there’s no problem launching several in parallel, and even doing them sequentially you can comfortably fit five to ten tasks in a day. The longest autonomous Ralphex run I’ve had was about 7 hours, but that was a genuinely complex problem involving fundamental changes to a project.

git, commits, and the finalizer

The whole thing is tied to git: during execution, each task gets its own commit, and during review, fixes are also logically committed.

On top of that, there’s a final stage I added just days ago, a sort of finalizer. I noticed that after all tasks are complete, when I review the commits and evaluate the result, I often rebase all these commits into some logical sequence. Having five commits along the lines of “fixed this, fixed that” is just ridiculous for a PR, so I automated this last step too: you can enable the finalizer (it’s disabled by default), which will do a nice rebase for you.

not just code

As it turned out, Ralphex can be used for more than just programming problems. Just yesterday I had a task related to this very blog, namely to improve, expand, and deepen it, which is essentially not so much programming as working with text and content. As an experiment, I tried describing this task as a plan and handing it to Ralphex for execution, and surprisingly the result turned out more than decent. Apparently, the approach itself with its decomposition into tasks and sequential execution with review is universal enough and not limited to writing code.

conclusions

This approach has really grown on me and I find it quite sensible, pragmatic, and productive. The only thing that genuinely bothers me sometimes is that an error in the plan can be costly. A mistake in the plan can lead to a chain reaction where subsequent stages end up working on the basis of slightly corrupted previous ones. So the weight of upfront planning increases significantly.

But fundamentally, there’s nothing new here. In our new era of AI executors and juniors, the human role as seniors only grows. And seniors are exactly the ones who do this — they think with their heads and tell the junior what to do, how to do it, and what result to aim for.

Everything written here is exclusively my personal experience as of February 2026. In six months, things will surely be different, and this post will turn into yet another artifact from the past.

This post was translated from the Russian original with AI assistance and reviewed by a human.