what is a kubernetes operator?

What is a Kubernetes Operator?

By Richard Lander, CTO | May 3, 2024

A Kubernetes operator is a custom extension of the Kubernetes control plane.

There are many open source Kubernetes operators available for you to use for various purposes. This blog post will help you understand how they work. Additionally, operators can be very helpful in managing the workloads you develop and deliver to your end users. We will also outline how you can tackle developing them.

Kubernetes Control Plane

To understand Kubernetes operators, it is critical to understand how the Kubernetes control plane works in general. For this, see our blog post: What is the Kubernetes Control Plane?

Types of Operators

App Stack Management

This category of operators provides an abstraction for deploying and managing particular applications. They are most helpful with sophisticated stateful apps that are relatively involved to operate and for those apps that have multiple workload components. With this kind of operator a user often creates, updates, and deletes a single custom resource instance that triggers a Kubernetes controller to create, update, and delete all the Kubernetes resources that constitute that app. The app often consists of dozens of different resources so this kind of operator is extremely helpful in reducing operational toil and improving consistency and reliability. Popular examples include the Prometheus Operator and various database operators. These are a very common category of Kubernetes operator.

KubernetesOperatorAppStackManagement

External Integrations

This kind of operator uses custom resources to define systems external to Kubernetes such as cloud provider resources. The AWS Controllers for Kubernetes are a good example of this.

Workload Support Systems

Some operators don't directly manage entire app stacks or external resources, but instead provide configuration support services for applications. They often watch resources created by other systems and take config actions to support different workloads. A good example of this is cert-manager which is commonly used to manage SSL/TLS certificates based on Ingress resource configurations.

Components of an Operator

There are two main components to an operator.

  • The custom resource definition extends the API and provides a new resource kind with fields defined by the developer of the operator.
  • The custom controller is a piece of software that responds to the users creating, updating or deleting instances of a custom resource. It is responsible for reconciling existing state to satisfy the desired state declared by the user in the custom resource.

Custom Resource Definitions

A custom resource definition (CRD) is a core resource in Kubernetes. It allows you to use the OpenAPI spec to define the fields for a custom resource and some basic validation rules. When you create a CRD, you instantly extend the Kubernetes API to support a custom resource kind.

The CRD encapsulates the data model for your operator. When designing a CRD, focus on what pieces of information are required by your controller to make decisions about configuration for your use case. Start with a bare minimum of fields on your first iterations and add them as you find the need. And always be on the lookout for high level configuration values that can be evaluated by your controller to abstract away details of configuration.

Over-parameterization impacts the usability of your operator. If your custom resource accumulates dozens of required fields that necessitate deep configuration from the user, see if sensible defaults can be used to make them optional. Or find higher level configuration that can allow your operator to make the right decisions for the user.

Let's use the AppStack example from above. The following manifest is an example of requiring a user to set several granular configurations for:

  • Number of replicas on the primary app deployment.
  • Whether to use a Kubernetes horizontal pod autoscaler (HPA) to manage pod replicas according to load.
  • The resource requests for the primary workload.
/apiVersion: example.com/v1beta1
kind: AppStack
metadata:
name: dev-app-stack
spec:
replicas: 2
enableHorizontalScaling: false
resources:
requests:
memory: "64Mi"
cpu: "250m"

Instead, you could allow users to define a single high level config that lets the controller set all the granular values for them. In this example the replicas, the HPA and resource requests would all default to sensible values for each of the three tiers. You are unlikely to need the HPA for dev instances, but likely will in prod, for example.

apiVersion: example.com/v1beta1
kind: AppStack
metadata:
name: dev-app-stack
spec:
tier: dev # dev | staging | prod

When providing high level options like this, recognize that sometimes granular config is required to test certain features or solve problems. The best abstraction is one that allows high level config when useful, but retains the ability to provide granular overrides when needed. That would mean the more granular fields are optional and will override the defaults used when the spec.tier field is set.

Custom Controllers

The custom controller contains the logic for configuring, deploying, updating, and deleting components of the software under management by the operator. In the app stack example, that would include all the resources that comprise the app stack.

The controller connects to the Kubernetes API and watches the custom resource/s that it is responsible for. In the diagrammed example above, the controller watches AppStack resources. When one of those resource is created, the controller deploys all the Kubernetes resources to spin up an instance of the app. When the AppStack resource is updated, the controller makes the necessary changes to the underlying Kubernetes resources. When the AppStack resource is deleted, all the underlying Kubernetes resources will be removed.

This encapsulates all the CRUD (Create, Read, Update, and Delete) operations that make up the essential functionality an operator must handle. This functionality will provide a vastly improved experience to users and significantly reduce complexity in automated delivery systems, e.g. GitOps pipelines, but this is just the beginning. You can now add advanced functionality that provides tremendous benefit for critical operations such as:

  • Backups: Provide automated backups for persistent data to safe storage locations.
  • Upgrades: Provide programmatically controlled version upgrade features that takes backups, updates database schemas and updates software versions. This provides incredible value in reducing stress and toil for your operations teams.
  • Rollbacks: Provide programmatically controlled rollbacks when upgrades go sideways. Another invaluable way to reduce stress for operators.

Tools for Building Operators

The Kubernetes API is a RESTful API and client libraries exist for many different languages. However, Kubernetes is written in Go and, if you write your operator in Go, you can use the core Kubernetes libraries and all of the features available in them. Furthermore, Go is a good language for the purpose. Most developers can pick up Go pretty quickly and use one of the SDKs to get them started.

Kubebuilder

Kubebuilder is the most common tool used for building operators. The project has excellent documentation and provides all the scaffolding and boilerplate for almost any operator project.

Operator Builder

Operator Builder extends Kubebuilder and provides the opportunity to supply tested Kubernetes manifests to inform the code generation for your operator. Operator Builder provides significant time-saving when building an App Stack Management operator. It takes the Kubernetes manifests provided and generates all the Go code to define those resources. If you've ever had to code all the resource definitions in an operator project, you'll appreciate how much time can be saved. Operator Builder will generate all the code necessary for the CRUD operations in your controller.

Operator SDK

The Operator SDK offers the ability to develop an operator with Go, Ansible or Helm. I have not seen a useful implementation using Ansible or Helm and would advise against it. Config tools do not have the capabilities of the Go programming language and, while they may be useful for a quick POC in some cases, you'll likely find the limitations will require a re-write in Go later anyway, especially when tackling advanced functionality beyond the CRUD operations.

Threeport

Threeport is not a Kubernetes operator. Rather it is an application orchestrator that integrates very well with the operator pattern. We use operators as a part of Threeport and strongly advocate for their use on high value workloads. If you're managing cloud native applications, try out Threeport by downloading the CLI at GitHub and see our docs instructions to get started.

Qleet

If you like what you see in Threeport and want the features without the overhead of managing Threeport control planes yourself, check out Qleet. We provide managed Threeport control planes which are supported by the developers of Threeport.