Mountain and a network to represent croit AI's journey

How We Built AI Into the Way croit Works

By Martin Verges

It started at a workation in the Bavarian Alps

In May 2023, the whole team came together for our annual workation in Berchtesgaden, in the Bavarian Alps. We are a fully remote company, so the week we spend in the same place each year matters; it's where the conversations happen that never quite happen over video. Somewhere between the workshops and the hikes, the two of us stood up in front of everyone and said something that, at the time, still sounded like a bet: AI is not a passing trend, and it is not optional. It is something every one of us is going to use, every day, in the work we already do.

ChatGPT had landed a few months earlier. We had all played with it, and the reaction inside the company was the same one a lot of people had that year, a mix of "this is a toy" and "wait, this is actually useful." We came down firmly on the second side. Not because of the hype, but because we could already see specific places in our own work where a capable language model would save hours: reading through logs, drafting code, working through the kind of dense documentation that Ceph operators live in.

So we made it official. AI was going to be a driving force at croit for the years ahead.

We were also clear about what this did not mean. Nobody was going to be replaced by AI. The way we put it in Berchtesgaden has held up well: no one loses their job to AI, but people who don't use AI will be replaced by people who do; colleagues who become faster, more efficient, and better at their work because they learned to use the tools. The technology isn't here to shrink the team. It's here to make the great team we already have more capable.

Why does this fit how we already work

The announcement didn't come out of nowhere. One of the things we expect from everyone at croit is lifelong learning. The technology we work with doesn't stand still, and neither can the people working with it. AI is simply the clearest example of that principle in action: a skill that didn't exist in anyone's job description a few years ago, that everyone is now expected to pick up and keep getting better at. We don't treat that as a burden. It's the same muscle our engineers already use to stay current with Ceph, with the kernel, with the hardware underneath it all.

From announcement to habit

Saying it on a team trip is easy. The harder part was turning it into a habit that actually changed how people work. That takes time, and it is still going on.

Today, every person on the team has access to the strongest models on the market. We don't lock people into a single vendor, because the field moves too fast for that to make sense. People can use Anthropic's Claude models, OpenAI's GPT models, Google’s models, and the newer entrants coming out of Asia such as Z.AI and GLM-5.2. When a better model ships, our people get it. The point is not loyalty to one provider, it's giving engineers the best tool for whatever is in front of them that week.

What grew over these two-plus years was less about access and more about fluency. People learned which model to use for which job, how to prompt for real answers instead of plausible-sounding ones, and where the models are still wrong often enough that you have to check them. That kind of judgment doesn't come from a memo. It comes from using the tools daily until their strengths and limits are obvious.

AI has been a thread through every workation since that first one in Berchtesgaden. At our most recent one, in Antalya, we went a step further and ran intensive hands-on workshops, sitting down together to work through which tools fit which tasks, how to get useful output, and how to fold them into day-to-day work. The goal was simple: help everyone on the team move further along in adoption, not just the people who took to it early. Fluency spreads faster when you build it in the same room.

Privacy by design

Handling data carefully has always shaped how we work. We rely on AI throughout, to review software and increasingly to build it, so the question of where our data ends up was one we settled early. From the start we've worked only with providers on paid plans whose terms keep our data out of any training pipeline, backed by proper GDPR agreements. That was already a deliberately conservative baseline.

But even the best-contracted external service still means data sitting on someone else's infrastructure, under someone else's terms. A data processing agreement makes that defensible. It doesn't make it truly ours. We didn't want defensible. We wanted control.

So we took the obvious next step. Rather than send anything sensitive to an external API, we built AI we run and govern entirely ourselves.

Bringing the models in-house

So we built our own AI infrastructure.

We bought GPUs and now run our own models on our own hardware, for both text and voice. In front of those models sits an LLM gateway wired into our OIDC identity system, plus a chat interface that every employee uses. From a user's point of view it feels like any other chat tool. The difference is where the data goes: nowhere it shouldn't. The prompts, the context, the customer-specific details; they stay inside our infrastructure instead of being handed to a cloud provider.

This solves the GDPR question at the root rather than by policy alone. People don't have to constantly judge whether a given piece of information is "safe enough" to paste into an external tool, because for the sensitive work they have a capable assistant that never leaves our walls. The compliant path and the convenient path are the same path, which is the only way a rule like this survives contact with a busy team.

Teaching an LLM to run croit

Once the models were in-house and trusted, the interesting work began: connecting them to the things we actually do.

One of the first projects was an MCP server that gives a language model access to the croit interface. Not read-only access: full operational control. Through this server, a model can perform administrative tasks in croit on its own. It can stand up a complete Ceph cluster from scratch. It can change disks, adjust settings, manage pools, tune placement groups. If an action exists in our UI, it can be expressed and driven through the MCP server.

That is a meaningful shift. The same operations a human operator performs through the interface are now available to an AI agent through a structured, well-defined surface. It turns the croit UI from something you click through into something you can describe in plain language and have carried out.

It ran into walls, though. The models of the time had small context windows, which capped how much state you could keep in front of them, and our API documentation (written for human developers) wasn't good enough for a model to reason over reliably. The gaps a person fills in from experience were exactly the ones a model stumbled on. The MCP server worked, but it worked against those constraints rather than with them.

Models have since gained bigger context, better reasoning, and more tolerance for imperfect inputs. But rather than keep driving our UI from the outside, we moved on to integrating AI natively into our products, a tighter, better-supported experience for customers than bolting an agent onto the interface ever could be.

Turning tickets into answers

The second area where this paid off is support, which for a storage company is where a lot of hard-won knowledge lives.

We built tooling that takes an incoming error message or ticket and triangulates it against everything we know. Instead of an engineer starting from a blank page, the system pulls together relevant context from across our accumulated knowledge, cross-references it with known Ceph bugs and similar past cases, and comes back with proposed solutions. It connects the symptom in front of you with the patterns we've seen before and the issues already documented upstream.

The value here isn't replacing the engineer. It's collapsing the slow part, the search through scattered history to figure out whether this problem is new or something we've solved five times already. The judgment stays human. The grunt work of finding the relevant precedent does not.

Making the codebase searchable by meaning

Underneath both of those is a more fundamental change. Our entire codebase is now indexed using vectorization and retrieval-augmented generation.

In practice, that means our models can find the relevant part of the code by meaning, not just by keyword, and ground their answers in what the code actually does rather than what they guess it might do. When an engineer asks how a particular subsystem behaves, or where a specific behavior is implemented, the assistant can retrieve the real source and build its answer on top of it. The suggestions land faster and they land on target, because they are anchored to our code instead of a model's general impression of how software like ours tends to be written.

Where this goes next

Two-plus years in, AI at croit is no longer a bet. It's infrastructure: our own GPUs, our own gateway, our own models, wired into our UI, our support flow, and our code.

The direction from here is to keep widening that integration while holding the same line we drew at the start. More of the routine operational work moving to agents, through tooling we build directly into our gateway and through CLI agents, not just one interface, with humans setting intent and reviewing outcomes. Sharper retrieval over our knowledge and code, so the gap between a question and a grounded answer keeps shrinking. And continued freedom to adopt whichever models are genuinely best, rather than betting the company on one provider's roadmap.

What hasn't changed, and won't, is the principle the whole thing rests on: use the best AI available, run it where we can guarantee our customers' data is safe, and treat that boundary as non-negotiable. Everything we've built came from taking both halves of that sentence seriously at the same time.

What's next, and how to reach us

This post stayed at the level of the story. In the ones that follow, we'll get into the technical detail: how the MCP server is built and secured, how we run our own models and gateway, and how the RAG layer over our codebase actually works. If that's the part you came for, it's worth keeping an eye on the blog.

If you'd rather talk it through, reach out. We're happy to share what we learned building this, and we also run AI trainings for teams that want to get their own people fluent: the same lifelong-learning approach we apply internally. Either way, get in touch at contact@croit.io.