Staff+ Engineer in the SaaS World - The Control Planes

and

Apr 09, 2025

So far in our journey through the SaaS world, we’ve explored the multitenancy spectrum and the life of a tenant. Today we are going to broaden our view of the SaaS environment and explore control planes.

As an engineer, you’ve almost certainly (perhaps unknowingly) come across control planes. They are a collection of services that manage the system. In the case of the cloud, they are APIs that are used to create, read, update, delete, and perform other operations on the resources (in Azure, for example, this is Azure Resource Manager). Kubernetes also has a control plane that is responsible for the overall state of the cluster (the API server, etcd, scheduler, and controller manager). Most SaaS solutions include a control plane to manage higher-level tasks across all tenants.

Architecting a control plane for your SaaS solution starts with understanding what responsibilities should be within its scope.

Standard Control Plane Responsibilities

There are certain capabilities that are considered standard and appear in almost all SaaS control planes to cover the key events in the tenant lifecycle. These are tenant onboarding and offboarding. We’ve discussed these events when we talked about the life of a tenant. The differences that were highlighted there (based on what is the entity that your product provides services for) determine the extent to which these events should be automated as part of the control plane and whether self-service is desired.

When automation is key, the control plane will aim to cover the full process of provisioning/deprovisioning tenant dedicated resources as well as reconfigured shared resources to be aware of the current tenant setup. This doesn’t mean that the control plane needs to directly implement these processes. It can simply be an invoker and orchestrator for them, while the actual pipelines are implemented in your DevOps platform. Being an invoker and orchestrator is also typical when the onboarding and offboarding processes are less automated, to allow for more customisation. In these cases, some of the work will often be semi-manual or manual, and the control plane will deal with tracking the progress and notifications.

If self-service is desired, the control plane will also be responsible for managing customer admin users. In these scenarios, the control plane is an externally facing administrative application where customers can create their admin accounts to initiate tenant onboarding and offboarding by themselves.

The control plane is also a natural place to host billing responsibilities. Billing and cost management is a separate subject when it comes to solutions delivered using a SaaS business model (one that I intend to explore in more depth in a future article), but from the perspective of architecting a control plane, you should be thinking about how it can expose consumption data, automate invoicing, and whether you want to integrate it with third-party billing providers.

These capabilities cover the higher-level tasks around business operations, but advanced control planes go beyond that.

Advanced Control Plane Responsibilities

The advanced control planes help manage the middle part of the life of a tenant - the struggles.

Just as the control plane is a natural place to track consumption metrics, it’s also a natural place to track operational metrics - feature usage, resource consumption, performance. Tracking metrics quickly evolves into alerting on them, and then into performing automated maintenance operations. A truly advanced control plane is capable of making autonomous decisions about tenant placement, scaling and movement.

Of course, you can’t architect your control plane for these operations from day one, as you need to understand the characteristics of your solution in practice. In fact, you may not want to build a control plane from day one.

When to Start Building a Control Plane?

Architecting and building a control plane is an effort, and every effort should have a business justification (especially in the SaaS world).

If the nature of the entity that your product provides services for doesn’t require self-service, the control plane may not provide any customer value.

If you don’t expect an immediate influx of tenants, you probably don't have an internal need for the control plane - your team can probably handle operations for a few tenants even manually.

In such cases, you should focus on documenting your processes, automating them once stable repeatability emerges, and then start to build a control plane on top of that as the number of tenants starts to grow.

Single Control Plane, Multiple Control Planes, or Maybe Both?

This is an interesting question and the answer usually lies at the intersection of where you are on the multitenancy spectrum and what your physical deployment model is.

If you are close to the “share everything” (fully shared) model, the most intuitive choice is to have a single global control plane that hosts all the responsibilities. That is, unless you are using deployment stamps for scaling. In this case, it may make more sense to have multiple stamp control planes that handle resource-related responsibilities, while the global control plane provides centralised visibility, makes tenant placement decisions and orchestrates tenant lifecycle events.

If you are close to the “share nothing” (fully isolated) model, having multiple tenant control planes makes a lot of sense, as resource management and maintenance is done at the tenant level. In this model, tenants are often highly customisable and configurable, which are also responsibilities that can be placed in tenant control planes. And depending on the processes behind the tenant lifecycle events, a global control plane may not even be necessary.

Other combinations are also possible, it strongly depends on where your solution evolves.

Keep Evolving Your Control Plane!

As your solution evolves, so should your control plane. It will take on more and more cumbersome and error-prone tasks as they arise. It can help eliminate the need to go through jump boxes for certain administrative tasks. It can replace “visit every tenant database” scripts in data extraction scenarios. It really can do a lot. Of course, there is a risk of bloating your control plane and making it a maintenance nightmare. So always think about what new functionality you are adding to it, and whether there are features that no longer make sense and should be removed.

Pathfinder Engineer

Discussion about this post