The Great Agent Skills Land Grab

An explorer claiming recently discovered land with a flag

10 June 20268 min

aiskills

Thousands of AI agent skills have flooded GitHub recently, most of them teaching models what they already know. The land grab is on.

Like many developers, I now lean on AI as a big part of my day-to-day tooling. With appropriate steering, I can have any number of coding agents working with me to accelerate my engineering work. For tasks where more specialized knowledge is needed, I can load in an agent skill to do something the AI model can’t reliably do on its own.

That’s the theory, anyway.

I installed a popular collection recently. A handful of skills about good engineering practice, well-organized, published by a developer I respect.

I loaded them into Claude Code and ran through some tasks I’d normally do by hand. The output was decent.

Then, as an experiment, I removed all the skills and ran the same tasks again. The output was identical.

That’s when I started looking more carefully at what was actually inside some of these repos. What I found there made me rethink why some of them even exist. My conclusion: creating agent skills collections has become a land grab.

A land grab is when new territory opens up, everyone rushes to claim as much of it as possible, and nobody stops to check whether what they’re claiming is actually worth anything to anybody.

An agent skill, as Anthropic designed the format, is a folder containing a SKILL.md file with instructions, optional bundled scripts the agent can execute, and optional reference material loaded only when needed. It’s a sound idea, and it gives an agent enough information to understand when to apply a skill without having to load the entire skill into context.

What’s appeared on GitHub in the past few months since the format was defined is the land grab in action. Thousands of skills, most of them instructions only, with no bundled code and no platform-specific reference material. Well-known developers and organizations are publishing collections of 50, 100, 200+ skills. Platform companies are shipping marketplaces and leaderboards. An “awesome-agent-skills” list is tracking the whole thing, covering everything from Solana development to AI fitness coaching. The developers with the biggest followings are staking the biggest claims, and their followers are installing whatever they publish.

But look at some of the code in these repos and you notice something: many of them look to have been written by the same models they’re supposed to be teaching. And, of course, that’s what makes the land grab possible. Nobody can realistically publish 200 skills in a weekend by writing them by hand.

Which leads to a simple test for any instructions-only agent skill: could an LLM have written it? If yes, the skill is almost certainly useless, because the knowledge it contains is already in the model’s training data. Loading it into context just spends tokens telling the model something it already knows. That test disqualifies a lot of what’s being published right now.

Open any popular general-purpose skills repo and try the test. You’ll find advice like “write tests before code,” “use semantic HTML,” “measure before optimizing,” and “code-split your bundles.” This is good engineering guidance. It’s also information that LLMs have already absorbed from thousands of blog posts and books. A skill repeating this knowledge adds nothing, and it takes up space that could hold information the model actually needs. If you can generate a skill with a prompt, you don’t need the skill.

The tells are right there in the first few commits: the same structure across dozens of files and a kind of polished completeness with no rough edges. When an LLM writes a skill that gets loaded into another LLM’s context, nothing new enters the system. It’s a closed loop, and handing the output back as context is a no-op.

One developer made the generation loop visible by accident. The first commit of a large and widely starred skills repository contains a CLAUDE.md file: inside, a set of instructions telling Claude Code to generate eight categories of skills across a twelve-week roadmap, complete with projected ROI numbers. The file’s own instructions say to exclude itself from version control, but the commit went through before the .gitignore file was in place; that was added in the second commit. The generation prompt was supposed to be hidden. It wasn’t. I’ll leave the repo unnamed here, since the point isn’t to name-and-shame anyone. The pattern is common enough now that, in the rush to claim ground, the cracks occasionally show.

Another collection of 50+ skills is simply the content from a popular software architecture website, reformatted as SKILL.md files. That website is the kind of public, heavily linked resource that major models have near-certainly ingested. The model read the book; these skills hand it the same book back and call it a new capability.

Other collections chase size. One developer ships over 200 skills spanning engineering, marketing, product management, and “C-level advisory,” which when you look inside means skills telling your coding agent how to prep a board meeting or run a competitive teardown. Another packages what it advertises as over 1,000 skills into an installable library with an npm one-liner, though the actual skills directory holds closer to a tenth of that number, padded out with aggregated collections from other repos.

“If you can generate a skill with a prompt, you don’t need the skill.”

There is research that backs up the pattern too. A recent analysis published on the Hugging Face blog by Shanshan Zhong, summarizing research by Bosch and Carnegie Mellon, looked at 40,000+ publicly listed skills from the skills.sh marketplace. They found an ecosystem shaped by what’s easy to produce: skill publication happens in short bursts that track shifts in community attention, content is heavily concentrated in software engineering workflows, and there’s a pronounced supply-demand imbalance across categories. The overall picture is of a marketplace racing to generate content rather than solve problems.

The skills that genuinely improve agent output share one thing in common: they contain information the model couldn’t possibly have, or they pair instructions with code that does work the model can’t do on its own.

Jonathan Fulton, a Staff Engineer at Datadog, has written about using skills built around pup, a CLI for querying Datadog metrics, tailing logs, and managing dashboards. The specific flags, query syntax, and output format for that tool aren’t something a model can reliably guess. With the skill loaded, Fulton can say “show me error rates for the experiments service over the last hour” and the agent gets it right. Without the skill, it has to search online for the docs and example code every time, or just guess. That’s a useful skill, because it fills a gap in what the model knows.

Matteo Collina publishes skills encoding his conventions for Node.js and Fastify. He co-created Fastify and has maintained it for years. His opinions on how it should be used, the patterns he prefers, the edge cases he’s run into: that’s information a model can’t work out from just inspecting the public docs. Benchmarks on his fastify-best-practices skill show measurable improvement in agent output across different models. The low baselines confirm what you’d expect: Fastify-specific patterns (env-schema for config, @fastify/under-pressure for backpressure, and close-with-grace for graceful shutdown) aren’t things models reliably know without guidance. None of that lives in the documentation, so the skill earns its place.

Anthropic’s own pre-built skills work because of what they bundle with the instructions. The xlsx skill pairs its instructions with openpyxl code that Claude runs. The markdown file exposes the functionality to the agent, and the value is in the executable code that sits alongside.

They succeed because the information or capability came from somewhere the model couldn’t reach on its own.

So why do so many generic skills collections keep getting published? Because the whole ecosystem is set up to reward volume over usefulness.

For individual publishers, a repo with 50 skills gets more GitHub stars than a repo with 3, regardless of whether those 50 skills tell the model anything it doesn’t already know. Stars, awesome-list placements, and social media reach all favor big collections over smaller, sharper ones. It’s how a collection ends up advertising 1,000+ skills it doesn’t actually contain.

Now, I don’t think the people publishing these collections are acting in bad faith. The instinct to share engineering knowledge is a good one, and many of the developers involved have genuinely shaped how the industry builds software. But that’s what the format incentivizes, and it makes a land grab all but inevitable.

For platform companies, a larger ecosystem justifies the investment in those marketplaces and leaderboards. Install counts and “works with 18+ agents” headlines need big numbers. There’s no reason for anyone to turn away a submitted skill just because it repeats what the model already knows.

And developers keep installing because it feels like the right thing to do. Loading a curated collection of engineering skills gives you the same sense of preparation as installing a recommended set of linter rules or editor extensions. It feels like due diligence. The difference is that a linter rule actually enforces something the editor wouldn’t do on its own, while a skill that says “write clear commit messages” tells the model something it could already tell you. But the instinct is the same: if a respected engineer published it, it must be worth having. The comfort of a well-stocked toolbox is hard to argue with, even if many of the tools duplicate capabilities the machine already has.

There’s one honest counter-argument to make, however. A skill can sometimes change behavior even when its content is already known to the model, by reshaping attention at the right moment. The model knows TDD exists, for example, but it doesn’t always default to writing tests first. A skill that triggers on coding tasks and explicitly says “follow red-green-refactor” might make the model more likely to actually do it. We could call this activation and recall: the skill surfaces knowledge the model already has, at the exact moment it needs it. The better-designed instructions-only skills include anti-rationalization sections that counter the specific ways models skip important steps, and in principle that’s a pattern the model might not produce on its own.

The problem with this defense is that almost nobody publishing these skills is measuring whether they work. The handful of people who have run proper benchmarks, like the teams behind skill-optimizer and Tessl’s evaluation tools, have mostly found strong improvements for skills that encode specialist knowledge and weak or absent improvements for skills that repeat general engineering advice. The largest lift in Tessl’s 880-eval benchmark came from a skill for snipgrapher, a niche CLI of Collina’s, which took the agent from around 52% to 88%. A generic TDD reminder might change behavior a little. It also costs tokens and competes for attention with other instructions. Worse, it can push the model toward workflows that don’t fit the task. “It might help a bit, sometimes” is not a strong case for loading dozens of them. At minimum, a publisher claiming activation-and-recall benefits should be running evals and showing the numbers. Almost none do.

If you’re picking skills to install, use the test. Does this skill contain information your model couldn’t already know, or does it ship code that does work the model can’t do on its own? If the answer to both is no, skip it. You’re paying tokens for nothing.

A lot of what’s out there fails the test. The ones that do pass work because their authors wrote down what only they knew. For your codebase, that author is you.

Don’t bother with the general best practices; the model already has those. Write about your world. Describe your project’s file structure, the module boundaries, the patterns your team has settled on and the ones you’ve ruled out. Your internal CLIs, custom build plugins, private APIs, and deployment steps are all places where an agent burns tokens when it guesses wrong, so write those down too. Capture the reasoning behind your decisions: why you picked this state management approach over the alternatives, or what pushed you toward the current architecture. An agent that understands the thinking behind your codebase can extend it well. One that only knows generic best practices will keep suggesting things that cut across choices you’ve already made.

Where the model keeps getting deterministic work wrong, write scripts and let a skill teach the agent when to run them. Each of these skills will be short, specific, and often useless to anyone outside your team. That’s how you know they’re working.

“Write about your world.”

Don’t expect your skills to be appearing on a leaderboard or getting thousands of GitHub stars. Don’t get sucked into the land grab.

The rush to publish and install generic skills will slow down as developers notice that loading them doesn’t make their agent noticeably better. What remains after the rush is the slower, quieter practice of teams writing down their own knowledge so their agents can use it. That’s the type of skill that actually changes what your agent can do, and it’s the one that will have the most impact in your day-to-day work.

Share this post:

The Great Agent Skills Land Grab

Keep Reading

AI Is Just the Latest Frontend Killer. Don’t Panic.

Browsers Treat Big Sites Differently

The Design-Minded Engineer