Software Development

Automating Patching is 20% of the Problem – What About the Other 80%?

Home

Blog

Software Development

Łukasz Durlak

All posts by this author

Share this article

Subscribe to our newsletter

Key Challenges in Cloud Migration Projects

The Tech Behind iGaming: Why you Need to Modernize Your iGaming Platform

CRA and SBOM in Embedded Systems: Why Firmware Transparency Is Now a Compliance Requirement

All articles from this category

Published: 2026/06/25

11 min read

According to the Qualys 2026 Enterprise Patch & Remediation Benchmark, the average mean time to remediation for complex enterprise applications is now five months and ten days. Five months. For a patch that already exists, against a vulnerability that’s already public, on systems that are already in scope.

That number doesn’t describe organizations that lack patching tools. Almost every enterprise we work with has automated patching in some form – cron jobs, ansible playbooks, vendor agents pushing updates from a dashboard. Qualys themselves report that roughly 40 of every 150 million patches deployed across their customer base last year went out zero-touch – no human in the loop at all.

So, the obvious questions for anyone running infrastructure operations are: if patching is already automated, why does the remediation curve look like that? Why are vulnerability counts still climbing quarter over quarter? Why is the audit team still finding hosts outside policy? And why are engineers still working Friday nights at 22:00?

In our engagements across telecom, financial services and managed service providers, we’ve seen the same answer repeat. Most organizations have automated 20% of the problem and assumed they were done. The other 80% is what determines whether vulnerability counts go up, plateau, or finally start to decline.

The 20% trap: a script is not a platform

When a team says “we automated patching,” what they almost always mean is that someone wrote a playbook that runs apt update, yum upgrade, or the Windows equivalent across a list of hosts. It works. It’s faster than the manual process it replaced and it genuinely saves time.

That’s the 20%. The actual command that installs the update.

The remaining 80% is everything that has to happen around that command for patching to function as a controlled enterprise service, rather than a sequence of weekend fire-fights:

Knowing which hosts need patching, in what order, with what dependencies
Getting approval from application owners before touching their systems
Notifying stakeholders on a schedule they’ve agreed to
Running pre-checks (is the host healthy enough to patch?) and post-checks (did the patch break anything?)
Recording what happened in a way that survives an audit
Making sure the right people can run the right jobs against the right systems – and nobody else
Tracking which hosts are outside policy and why

A naked playbook does none of this. And in environments at any meaningful scale – 2,000 VMs and up – the absence of these capabilities is exactly what causes the vulnerability curve to keep climbing, even if “automation” is in place.

What we’ve seen go wrong

Two patterns recur often enough that we now treat them as diagnostic.

Pattern one: brownfield rot. A European managed service provider engaged us to modernize a custom-built patching tool that had been running for years. On paper, the client had a clear patching policy: every host, every quarter. In practice, the home-grown patcher had no role-based access control, no support for newer operating systems, no integration with the ticketing system and critically, no visibility layer. When we mapped what was actually being patched against what should have been patched, we found that a large portion of the estate had been quietly excluded from patching cycles, in some cases for years. The policy was quarterly, but the reality was “never.” The audit was going to fail, and nobody on the operations side could say with confidence which systems were exposed.

Pattern two: the human cost. Before our engagement, a major European telecom operator was running its quarterly patch cycles the only way the existing tooling allowed: with engineers logged in late at night and on weekends, manually walking through batches of servers, fixing failures by hand and filing tickets after the fact. The cost wasn’t just overtime. It was attrition risk, on-call burnout, and the slow erosion of the team’s appetite to stay in operations work at all.

In both cases, the organization technically had “patch automation.” Neither organization had a patch automation platform.

The three pillars that actually move the curve

When we redesign these environments, we build around three pillars. None of them is new technology. What’s new is treating them as a single, integrated service rather than a collection of scripts.

Pillar 1 – Lifecycle workflow. The unit of work is not “run a playbook.” It’s ticket → impact analysis → approval → stakeholder notification → pre-test → patch → post-test → ticket close, with every step tied to the records that already exist in ITSM and CMDB. When a change request opens, the platform pulls the affected configuration items, runs CMDB/CSDM impact analysis to flag downstream services, posts the patch window to the right Slack/Teams channels and email distribution lists, runs its pre-checks, executes the patch, validates the result and closes the ticket with full evidence attached. If the post-test fails, the change request is held open and escalated. The engineer’s role moves from “execute the patch” to “review exceptions” – which is the only part of the job that actually requires their judgment.

Pillar 2 – Standardization, reuse, and RBAC. One canonical playbook per OS family, maintained centrally, reused across every team that operates RHEL hosts, Windows servers, or Kubernetes nodes. RBAC determines who can run which playbook against which inventory. The networking team can patch their own routers but cannot touch database servers. The DBA team can run pre-approved jobs against their estate but cannot modify the playbook itself. The platform team owns the playbooks; the consuming teams own their inventories. This is the structural piece that breaks the typical “every team maintains its own fork of the same broken script” pattern, and it’s the piece that home-grown patchers almost never get right.

Pillar 3 – Visibility and traceability. A single dashboard that answers three questions in real time: what was patched, what was not patched and why. Every exception has a recorded reason – change frozen, owner declined, host unhealthy, dependency blocking – and an owner. Vulnerability scanner output (Qualys, Tenable, Rapid7) is correlated against patch deployment data so the gap between “patch released” and “patch installed in our environment” is visible to operations, security, and the audit team simultaneously. This is the pillar that ends the era of policy saying one thing and reality saying another.

What changes when the platform is in place

The major European telecom operator’s results are the cleanest illustration we have. After deploying the AWX-based platform with full ITSM integration, RBAC and standardized playbooks across roughly 7,000 VMs spanning Linux, Windows, Kubernetes, Oracle Database, VMware and Palo Alto:

Manual effort dropped by 90%
Annual savings: 200K PLN in immediate overtime elimination, with projected full-scale savings exceeding 1M PLN per year and payback under six months
Quarterly patch cycles now run end-to-end without late-night maintenance windows.

But the metric that matters most for security leadership isn’t on the published case page. It’s the vulnerability count trend.

For the brownfield rebuild at the European managed service provider – over 10,000 VMs, legacy custom patcher replaced with AWX, Katello and ServiceNow – the immediate win wasn’t cost. It was finally being able to answer the auditor’s question: which hosts are outside policy and why? Once that answer existed, closing the gap was a matter of weeks, not quarters.

The human dimension is harder to quantify but, in our experience, the change leaders care about most.

When this isn’t worth doing

The pillars above are not universally applicable. There are at least four cases where investing in a platform-grade patch automation rebuild is the wrong call:

You’re below ~500 managed systems. The fixed cost of building the workflow, RBAC and visibility layers is hard to amortize at that scale. A maintained set of playbooks plus a discipline around tickets is usually enough.
Your environment is genuinely homogeneous and stable. A single OS family, minimal third-party software, no compliance pressure – your existing zero-touch automation is probably already capturing most of the available value.
You don’t have an ITSM/CMDB foundation to integrate with. The platform’s value comes from coupling automation to the records of truth your organization already uses. If those records don’t exist or aren’t trusted, fix that first.
Patching is not your bottleneck. If your vulnerability backlog is driven by end-of-life systems, undocumented assets, or compensating-control gaps, a better patch platform will not move the needle. Address the root cause.

For everyone else – the mid-market and enterprise environments running thousands of heterogeneous workloads under quarterly compliance pressure – the platform pillars are where the second 80% of the value lives. The 20% you’ve already automated is real, and worth keeping. It’s just not where the audit, the burnout, or the vulnerability curve get fixed.

Want to learn more about boosting operations and cutting development costs? Get in touch with our team.

FAQ

What does the rollout actually look like, and how quickly do we see value?

It depends on the scope of automation and the environment we deploy into. In practice, it can take from a few weeks to a few months.

The timeline mainly depends on:

how many automation scenarios need to be covered
how many different OS groups and server types need to be onboarded
how complex and standardized the environment is
whether a Content Manager is already in place and can provide approved patching content
whether existing automation already exists and only needs to be onboarded to the platform
whether we need to configure and orchestrate existing playbooks, or build automation from scratch
whether the patching process, approvals, rollback rules and maintenance windows are already defined

The fastest value comes when the customer already has working automation, for example Ansible/AWX playbooks, and we mainly need to onboard them into a controlled platform, add governance, scheduling, reporting, approvals and operational visibility.

If processes and automation need to be designed from zero, the rollout is longer, but we usually still start with a limited scope first, for example one OS group or one non-production environment, so the business can see value early and then scale gradually.

How do we make the business case to leadership when patching already ‘works?

It depends on the current pain point, because the business case can be built in a few different ways.

Risk reduction. If the number of open vulnerabilities is very high, or if the backlog keeps growing, then the current process may “work” operationally but not effectively reduce security exposure. Full automation allows the organization to patch faster, process more servers in less time and gradually reduce the vulnerability backlog.

Cost and efficiency. To make this case properly, you need to estimate the total cost of the current patching capability, not only the hours spent on monthly execution. That typically includes:

People time across the whole organization – planning, approvals, coordination, execution, troubleshooting, reporting and follow-up – not just the infrastructure or operations team.
Existing tooling and licenses – vendor patch management products (e.g. Satellite/Foreman, SCCM/Intune, BigFix, WSUS, Ansible Automation Platform, Tanium), repository mirrors, scanning tools and the infrastructure that hosts them.
In-house automation – custom scripts, scheduled jobs, ticketing integrations and reporting glue that someone has to maintain, document and keep compatible with every OS and middleware change.
Domain-specific knowledge – specialists who know how to patch Linux distributions, Windows, databases, middleware, hypervisors and appliances safely. This is often concentrated in a few people and carries a real key-person risk.
\Onboarding and training – every new engineer has to learn the local patching process, the exceptions and the rollback procedures before they can be trusted with production.
Indirect costs – downtime windows, missed SLAs, failed change requests, emergency reboots and the audit/remediation work that follows incidents.

Once these are added up, the cost of the current solution is usually much higher than the cost of the tooling line item in the budget. A centralized and automated patching process can significantly reduce that effort, in some cases by up to 90%, depending on the current level of manual work, and it also consolidates fragmented tools and tribal knowledge into a single, documented workflow.

Control and auditability. Even if patching “works”, leadership usually cares about whether the organization can prove what was patched, when it was patched, what failed, what was rolled back and what remains exposed. Automation gives better traceability, standard reporting and more predictable execution — and removes the dependency on individuals remembering how a specific server or application is patched.

So the business case is usually not “we need a new patching tool”. It is more about reducing vulnerability exposure, lowering the total operational cost (people, tools and knowledge combined), improving audit readiness and making the process scalable and less dependent on specific individuals.

We already use Qualys Patch Management, Tenable, or a similar vendor’s automation. Why would we add another layer?

We are not replacing Qualys, Tenable or similar tools. They remain the source of vulnerability intelligence and patch recommendations.

The additional automation layer is there to manage the full remediation process around the patch itself: approvals, maintenance windows, RBAC, pre-checks, post-checks, rollback, ITSM/CMDB updates and audit evidence.

Scanner-based remediation usually works well for standard cases, but complex environments often require more orchestration: clustered databases, middleware, Kubernetes, appliances, application dependencies, restart order, ownership rules and change freezes.

The key point is that knowing what to patch is not the same as knowing how to patch it safely in your environment.

So the question is not whether Qualys or Tenable can deploy a patch. The question is whether the organization can control, track and prove remediation end-to-end.

In short: Qualys or Tenable tells us what to fix. The automation layer ensures it is fixed in the right way, by the right team, with the right evidence, while keeping the remediation process independent from a single scanner vendor.

We have already invested in automation tooling such as AWX, SaltStack or similar platforms. Why should we invest further?

You may not need to rebuild anything. The key question is how mature your current automation is.

If you already have a controlled, reusable and auditable automation platform, then the next step may be only optimization or onboarding more use cases. But if your current setup is mostly scripts or playbooks triggered by engineers, then there is usually still a maturity gap.

A simple way to assess this is:

are you automating only technical procedures, or the full operational process?
do you have approvals, maintenance windows and ownership built into the workflow?
are pre-checks, post-checks and rollback standardized?
is access controlled through RBAC?\
is the outcome visible in ITSM/CMDB and audit reports?
can different teams reuse the same standard automation instead of maintaining their own versions?

So the investment is not necessarily about buying another tool. It is about moving from “we can run automation” to “we can operate patching as a controlled, repeatable and traceable service”.

This follows the same maturity logic used in automation and orchestration models: organizations usually move from task automation, through process automation, toward governed and integrated orchestration. It also matches the article’s point that the script or playbook is often only the first 20%, while workflow, RBAC, standardization and visibility create the remaining value

All posts by this author

About the authorŁukasz Durlak

Principal System Engineer

Łukasz Durlak is an experienced IT infrastructure leader and Linux automation expert with over 15 years of experience in managing complex, enterprise-scale environments. At Software Mind, he leads a team responsible for Linux operating system services across an infrastructure of approximately 10,000 virtual machines. His areas of responsibility include lifecycle and configuration management, monitoring, security, vulnerability management and automated patching. Together with his team, he ensures the stability, security and continuous development of large-scale Linux environments. Łukasz has extensive experience in infrastructure automation and configuration management, particularly with Ansible Automation Platform, AWX, Chef, SaltStack and Puppet. Throughout his career, he has led global UNIX teams, developed Linux patch management processes, introduced standardization initiatives and designed automation frameworks supporting enterprise IT operations.