When AI Meets the ITAM TEAM: A Non-Developer's Experiment
- Get link
- X
- Other Apps
When AI Meets the Help Desk: A Non-Developer's Experiment
How I tried to automate one of the most human jobs in IT operations — and what I learned
I'm not a developer.
I studied programming in university, but for the past 14 years I've been doing infrastructure operations — managing VDI environments, IT assets, and service desks. Real coding drifted further and further from my daily work. Shell scripts here, a small automation there. Nothing like building an actual system with a purpose.
That changed about six months ago.
The Problem I Was Looking At Every Day
In our team, one person owned a very specific job: reviewing IT equipment requests.
Every month, hundreds of requests came through our ITSM system. Employees requesting laptops, desktops, workstations. New hires needing their first device. Contractors needing loaner equipment with the right paperwork. Staff asking for upgrades when their machines got slow.
This one person reviewed all of it. And reviewing didn't just mean clicking approve or reject in a system. It meant:
- Cross-referencing the company's equipment policy document
- Checking what inventory was actually available in the warehouse
- Deciding whether a request was justified based on the person's role, their current device, their usage history
- Handling follow-up questions that came through email, phone, and internal messaging — all at the same time
Roughly two-thirds of their working day went to this. Not because they were slow. Because the volume was relentless, and because every non-standard request required genuine judgment.
Some requests were straightforward. New hire, standard role, equipment matches policy — approve. But many weren't. The device a person asked for didn't match their job category. Their current machine was two years old but the specs showed it was barely being used. A contractor's paperwork was complete but their contract expired in three weeks. Each of these needed a human to think.
The Question I Started Asking
I watched this and started wondering: how much of this could AI actually handle?
Not to replace the person. That wasn't the framing. But to ask seriously — if we put an AI agent in front of this workflow, what happens?
I had no idea how to build that. But we're living in the age of vibe coding, and I had something I hadn't had before: a clear problem, a real workflow, and tools that could help a non-developer build something real.
So I started.
What I Actually Built
The system I built has three main components working together.
First, a rule engine. The equipment policy is a document — a real document with actual criteria. Certain job roles get certain equipment grades. Contractors need specific paperwork. Requests can't be re-submitted within a defined period. These rules don't require intelligence. They require consistency. The rule engine checks these conditions first, before anything else touches the request.
Second, a RAG pipeline. The policy document gets indexed and retrieved at judgment time. When a request comes in with an unusual justification, the AI can search for the relevant policy section and reason against it — rather than hallucinating a policy that doesn't exist.
Third, an LLM judgment layer. After the rules run and the relevant policy is retrieved, a language model produces a recommendation: approve, reject, or escalate to a human reviewer. It also generates a confidence score and a plain-language explanation of its reasoning.
The whole thing runs on FastAPI, PostgreSQL with pgvector for the RAG layer, and the Gemini API for the LLM calls. I built it after work hours, mostly alone, using Claude Code and a vibe coding workflow I'd never tried before at this scale.
Where My Thinking Changed
When I started, I was optimistic in the way that's probably familiar to anyone who's approached AI for the first time. The straightforward cases — standard request, correct role, policy match, paperwork complete — felt like they'd be easy. And they were. The system handles those well.
But preparing the golden set changed how I thought about the whole thing.
A golden set is a collection of real cases with known correct answers — the labeled dataset you use to measure whether your AI is actually making the right calls. Building it forced me to confront something I'd been hand-waving: the hard cases aren't hard because the information is missing. They're hard because the right answer depends on context that's difficult to formalize.
A two-year-old device with 90% average CPU utilization and a project deadline three days away — is that an approve or a hold for human review? A contractor whose paperwork is complete but whose contract expires in 25 days — approve or flag? A senior employee requesting equipment above their grade because their role changed six weeks ago and HR hasn't caught up — what's the right call?
These cases require judgment. Not just rule-following.
By the time I'd worked through enough of them, my mental model had shifted. AI wasn't going to replace this person. It was going to be their assistant. Handle the clear cases. Surface the relevant context for the ambiguous ones. Reduce the volume of decisions that required full human attention so the human could focus where they were actually needed.
That reframe made the project more honest — and more useful.
What I Learned About Building This Way
I spent more time before writing a single line of code than I did on the code itself.
The design phase — mapping the workflow, defining what each component needed to do, thinking through edge cases — that's where the real work happened. The prompting and context engineering that came later were only useful because the underlying thinking was clear.
Vibe coding didn't make the thinking easier. It made it possible to execute once the thinking was done.
And when I presented this internally, the reaction was honest: the timing isn't right yet. The technology works. The concept is sound. But deploying it into a live workflow requires organizational readiness that isn't there yet.
That was the right call. A system like this isn't just a technical deployment. It changes how a process works, who's accountable for decisions, and how exceptions get handled. Those questions need answers before the system goes live.
What This Experiment Actually Proved
Not that AI can fully automate equipment request review. It can't — not at the level of nuance that an experienced human brings.
But it proved something more useful: that the structured, repetitive core of this work is automatable. That a well-designed system can handle the clear cases, surface context for the hard ones, and give a human reviewer enough information to make a good decision faster.
In a workflow processing hundreds of requests a month, that's not a small thing.
The bottleneck in most IT service operations isn't that the work is impossible to systematize. It's that no one has taken the time to map what's actually happening, identify what's rule-based versus judgment-based, and build the infrastructure to handle each appropriately.
That's what this experiment was. And it's replicable.
This is the first in a series on applying AI to real IT operations workflows. Next: how the rule engine and LLM layer were designed to work together — and where that boundary between rules and judgment actually sits.
- Get link
- X
- Other Apps
Comments
Post a Comment