AI agents replacing SaaS workflows: what actually works, six months in.

Last October we helped a client cancel a $3,200/month billing platform. We replaced it with a Claude-based agent, a shared Google Sheet, and about 180 lines of Python glue. Six months on, the agent is still running, the client is still happy, and the savings are real. We also tried this trick on two other clients. One worked. One was a disaster. The disaster taught us more than either success.

Here's what we learned about which workflows actually peel off SaaS, and which ones look like they should but won't.

The billing replacement (worked)

The client is a small B2B services firm. Their old billing platform did three things they used: it took line items from a spreadsheet, generated PDFs, and emailed them on a schedule with a payment link. The rest of the platform — the dashboards, the reporting, the dunning automation — they never touched. They were paying $38,400/year for what was effectively a templated PDF generator with a Stripe connection.

The replacement: a sheet where the bookkeeper drops line items; a daily Claude job that reads the sheet, generates the PDF via a small Python template, and emails it with a Stripe Payment Link; a webhook that flags paid invoices back in the sheet. Total build time was about eleven hours of our work. Ongoing cost is roughly $40/month for API calls and infrastructure. The agent has produced something like 180 invoices and made one mistake we caught (an item description that paraphrased instead of copying verbatim, which we now constrain explicitly).

What made this work: the task is structured input to structured output. The judgment surface is small. The cost of an error is bounded (the bookkeeper reviews each invoice before it sends).

The scheduling assistant (worked)

Same pattern, different domain. A consulting client had a Calendly subscription, an admin assistant booking through email, and a Zapier zap connecting them. Three tools doing one job badly. We replaced the whole thing with a Claude agent that reads inbound scheduling emails, checks Google Calendar availability, proposes three slots in a natural reply, books the accepted one, and sends a confirmation. About six hours of work. Cancelled two SaaS subscriptions. Six months in, the agent has handled around 240 scheduling threads with three escalations to a human, all of which were genuinely ambiguous (one was a prospect proposing a phone call from a timezone they hadn't specified).

Same recipe. Structured task. Bounded judgment. Cheap to fix when wrong.

The customer-support replacement (failed)

This is the one we should have known would fail. A client wanted to replace their tier-1 support tool — a low-end ticketing system at maybe $800/month — with an agent that read incoming emails, looked up the customer in their database, and either answered or routed. We built it. It worked great in testing. In production it lasted nine weeks.

The problem was edge cases, and specifically the kind of edge case that requires a human to know which rule to bend. A customer wrote in furious about a charge. The agent's lookup confirmed the charge was correct per the contract. The agent wrote a polite, accurate, contractually-grounded reply explaining why the charge stood. The customer cancelled a $14,000/year account the next morning. A human in that role would have flagged the email, walked it to the account manager, and probably saved the account with a partial refund. The agent did exactly what it was supposed to do. The outcome was a five-figure mistake.

We pulled the agent. The client is back on their old ticketing tool, plus a Claude assistant that drafts responses for humans to approve. That setup has stuck.

The pattern

Agents replace SaaS cleanly when three things are true. The task is mostly mechanical (extract, transform, schedule, format, route). The judgment required is bounded and the rules are explicit. The cost of an error is small relative to the cost of the SaaS. When any of those three break, the math stops working.

The seductive trap is that the agent can do the mechanical part. It absolutely can. What it cannot do, reliably, is know when a situation has exited the mechanical regime and entered the judgment regime. A human support rep knows the difference between a billing inquiry and a "this customer is about to leave" inquiry, sometimes from a single phrase. The agent reads both as billing inquiries.

What we recommend now

Look at your SaaS stack and identify the tools where you only use 10–20% of the features and the parts you use are structured. Those are candidates. The CRM you barely touch except to pull contact lists. The project tool you use only for status updates. The scheduling tool, the templated email generator, the invoice former, the slack-to-spreadsheet logger. Any of these can be a Claude agent with a sheet behind it.

Do not touch anything where the failure mode is a person being upset. Customer support, sales conversations, performance reviews, anything where a wrong-but-correct response can damage a relationship. Those tasks still want a human in the loop, and the right pattern is "agent drafts, human approves," not "agent sends."

One last note. The build cost is now low enough that the calculation is different from a year ago. Eleven hours of work to save $38k/year is an absurd ROI, and we are getting better at the eleven hours every time we do it. If you have a small operations stack and you haven't audited it for agent-replaceable tasks, you are probably leaving real money on the table. Audit it. Just don't audit it with a customer-support tool open in the next tab.

The billing replacement (worked)

The scheduling assistant (worked)

The customer-support replacement (failed)

The pattern

What we recommend now

Curious which SaaS lines you could cut? We'll audit your stack with you.