Kubernetes is not the first platform that comes to mind to run Apache Kafka clusters. Indeed, Kafka’s strong dependency on storage might be a pain point regarding Kubernetes’ way of doing things when it comes to persistent storage. Kafka brokers are unique and stateful, how can we implement this in Kubernetes?
A special focus will be made on how to plug additional Kafka tools to a Strimzi installation.
We will also compare Strimzi with other Kafka operators by providing their pros and cons.
Strimzi is a Kubernetes Operator aiming at reducing the cost of deploying Apache Kafka clusters on cloud native infrastructures.
As an operator, Strimzi extends the Kubernetes API by providing resources to natively manage Kafka resources, including:
- Kafka clusters
- Kafka topics
- Kafka users
- Kafka MirrorMaker2 instances
- Kafka Connect instances
The project is currently at the “Sandbox” stage at the Cloud Native Computing Foundation.
Note: The CNCF website defines a “sandbox” project as “Experimental projects not yet widely tested in production on the bleeding edge of technology.”
With Strimzi, deploying a 3 broker tls-encrypted cluster is as simple as applying the following YAML file:
apiVersion: kafka.strimzi.io/v1beta2 kind: Kafka metadata: name: my-cluster spec: kafka: version: 3.2.3 replicas: 3 listeners: - name: plain port: 9092 type: internal tls: false - name: tls port: 9093 type: internal tls: true config: offsets.topic.replication.factor: 3 transaction.state.log.replication.factor: 3 transaction.state.log.min.isr: 2 default.replication.factor: 3 min.insync.replicas: 2 inter.broker.protocol.version: "3.2" storage: type: jbod volumes: - id: 0 type: persistent-claim size: 100Gi deleteClaim: false - id: 1 type: persistent-claim size: 100Gi deleteClaim: false zookeeper: replicas: 3 storage: type: persistent-claim size: 100Gi deleteClaim: false entityOperator: topicOperator: userOperator:
A topic looks like this:
apiVersion: kafka.strimzi.io/v1beta2 kind: KafkaTopic metadata: name: my-topic labels: strimzi.io/cluster: my-cluster spec: partitions: 1 replicas: 1 config: retention.ms: 7200000 segment.bytes: 1073741824
Both of these examples are from the
examples directory of the Strimzi operator. This directory includes many more examples covering all of Strimzi’s capabilities.
An interesting feature of Strimzi is the out-of-the-box security features. By default, intra-broker communication is encrypted with TLS while communication with ZooKeeper is both autenticated and encrypted with mTLS.
The Apache ZooKeeper clusters backing the Kafka instances are not exposed outside of the Kubernetes cluster, providing additionnal security.
Kubernetes comes with its own solution for managing distributed stateful applications: StatefulSets.
The official documentation states:
(StatefulSets) manages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods.
While StatfulSets have the benefit of being Kubernetes native resources, they have some limitations.
Here are a few examples:
- Scaling up and down is linear. If you have a StatefulSet with 3 pods: pod-1, pod-2, pod-3, scaling up will create pod-4 and scaling down can only delete pod-4. This can be an issue when you want to eliminate a particular pod of your deployment. Applied to Kafka, you might be in a situation where a bad topic can make a broker instable, with StatefulSets you can not delete this particular broker and scale out a new fresh broker.
- All the pods share the same specs (CPU, Mem, # of PVCs, etc.)
- Critical node failure requires manual intervention
These limitations were addressed by the Strimzi team by developping their own resources: the StrimziPodSets, a feature introduced in Strimzi 0.29.0.
The benefits of using StrimziPodSets include:
- Scaling up and down is more flexible
- Per broker configuration
- Opens the gate for broker specialization once ZooKeeper-less Kafka is GA (KIP-500, more on this topic later in the article)
A drawback of using StrimziPodSets is that the Strimzi Operator instance becomes critical.
If you want to hear more about the Strimzi PodSets, feel free to watch the StrimziPodSets – What it is and why should you care? video by Jakub Scholz.
Strimzi’s Quickstart documentation is perfectly complete and functionnal.
We will focus the rest of the article on addressing useful issues that are not covered by Strimzi.
Strimzi brings a lot of comfort for users when it comes to managing Kafka resources in Kubernetes. We wanted to bring something to the table by showing how to deploy a Kafka UI on top of a Strimzi cluster as a native Kubernetes ressource.
There are multiple open source Kafka UI projects on GitHub, to cite a few:
Let’s go for Kafka UI which has the cleanest UI (IMO) among the competition.
The following YAML is an example of a Kafka UI instance configured over a
SCRAM-SHA-512 authenticated Strimzi Kafka cluster. The UI is authenticated against an OpenLDAP via
apiVersion: apps/v1 kind: Deployment metadata: name: cluster-kafka-ui namespace: kafka spec: selector: matchLabels: app: cluster-kafka-ui template: metadata: labels: app: cluster-kafka-ui spec: containers: - image: provectuslabs/kafka-ui:v0.4.0 name: kafka-ui ports: - containerPort: 8080 env: - name: KAFKA_CLUSTERS_0_NAME value: "cluster" - name: KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS value: "cluster-kafka-bootstrap:9092" - name: KAFKA_CLUSTERS_0_PROPERTIES_SECURITY_PROTOCOL value: SASL_PLAINTEXT - name: KAFKA_CLUSTERS_0_PROPERTIES_SASL_MECHANISM value: SCRAM-SHA-512 - name: KAFKA_CLUSTERS_0_PROPERTIES_SASL_JAAS_CONFIG value: 'org.apache.kafka.common.security.scram.ScramLoginModule required username="admin" password="XSnBiq6pkFNp";' - name: AUTH_TYPE value: LDAP - name: SPRING_LDAP_URLS value: ldaps://myldapinstance.company:636 - name: SPRING_LDAP_DN_PATTERN value: uid=0,ou=People,dc=company - name: SPRING_LDAP_ADMINUSER value: uid=admin,ou=Apps,dc=company - name: SPRING_LDAP_ADMINPASSWORD value: Adm1nP@ssw0rd! - name: JAVA_OPTS value: "-Djdk.tls.client.cipherSuites=TLS_RSA_WITH_AES_128_GCM_SHA256 -Djavax.net.ssl.trustStore=/etc/kafka-ui/ssl/truststore.jks" volumeMounts: - name: truststore mountPath: /etc/kafka-ui/ssl readOnly: true volumes: - name: truststore secret: secretName: myldap-truststore
Note: By leveraging a
PLAINTEXT internal listener on port 9092, we don’t need to provide a
With this configuration, users need to authenticate via LDAP to the Kafka UI. Once they are logged in, the underlying user used for interactions with the Kafka cluster is the admin user defined in
KAFKA_CLUSTERS_0_PROPERTIES_SASL_JAAS_CONFIG. Role based access control was recently introduced with this issue.
We had a functionnal need to deploy a Schema Registry instance for our Kafka clusters running in Kubernetes.
While Strimzi goes the extra mile by managing additional tools like Kafka Connect or MirrorMaker instances, it is not yet capable of deploying a Schema Registry.
To mitigate this issue, the Rubin Observatory Science Quality and Reliability Engineering team worked on the strimzi-registry-operator.
The configurations we used are the one showcased in the example section of the README.
The only issue we encountered was that the operator is not yet capable to deploy a Schema Registry backed on a
SCRAM-SHA-512 secured cluster.
After many years of work on KIP-500, the Apache Kafka team finally announced that running Kafka in KRaft mode (ZooKeeper less) became production ready. The announcement was made as part of the Kafka 3.3 release.
I think calling it production ready for new clusters is a bit strange. It means that we would need to maintain two parallel code paths with guaranteed upgrades etc. for possibly a long time. So, TBH, I hoped we would have much more progress at this point in time and be more prepared for ZooKeeper removal. But as a my personal opinion – I would be probably very reluctant to call anything at this stage production ready anyway.
Following on these comments, we can guess that ZooKeeper-less Kafka is not going to be the default configuration in Strimzi in the next release (0.34.0 at the time of writing) but it will definitely happen at some point.
Storage is often a pain point with bare metal Kubernetes clusters and Kafka makes no exception.
The community consensus for provisioning storage on Kubernetes is via Ceph with Rook thought other solutions exists (Longhorn or OpenEBS on the Open Source side, Portworx or Linstor as proprietary solutions).
Comparing storage engines for bare metal Kubernetes clusters is too big a topic to be included in this article but feel free to check out our previous article ”Ceph object storage within a Kubernetes cluster with Rook” for more on Rook.
We did have the opportunity to compare performances between a 3 brokers Kafka installation with Strimzi/Rook Ceph against a 3 brokers Kafka cluster running on the same machine with direct disk access.
Here are the specs and results of the benchmark:
- Kafka Version 3.2.0 on Kubernetes through Strimzi
- 3 brokers (one pod per node)
- 6 RBD devices per broker (provisionned by the Rook Ceph Storage Class)
- Xms java default (2g)
- Xmx java default (29g)
Bare metal environement:
- Kafka Version 3.2.0 as JVM process with the Apache release
- 3 brokers (one JVM per node)
- 6 RBD devices per broker (JBOD with ext4 formatting)
- Xms java default (2g)
- Xmx java default (29g)
Notes: The benchmarks were run on the same machines (HP Gen 7 with 192 Gb RAM and 6 x 2 TB disks) with RHEL 7.9. Kubernetes was not running when Kafka as JVM process was running and vice versa.
kafka-producer-perf-test \ --topic my-topic-benchmark \ --record-size 1000 \ --throughput -1 \ --producer.config /mnt/kafka.properties \ --num-records 50000000
Note: The topic
my-topic-benchmark has 100 partitions and 1 replica.
We ran the previous benchmark 10 times on each configuration and averaged the results:
|Metric||JBOD bare metal||Ceph RBD||Performance difference|
|Records/sec||75223||65207||– 13.3 %|
|Avg latency||1.45||1.28||+ 11.1 %|
The results are interesting: while the write performances were better on JBOD, the latency was slower using Ceph.
There are two main alternatives to Strimzi when it comes to operating Kafka on Kubernetes:
We did not test Koperator thoroughly so it would be unfair to compare it to Strimzi in this article.
As for the Confluent operator, it provides many features that we don’t have with Strimzi. Here are a few that we deemed interesting:
- Schema Registry integration
- ksqlDB integration
- LDAP authentication support
- Out-of-the-box UI (Confluent Control Center) for both Admins and Developpers
- Tiered storage
All these come with the cost (literally) of buying a commercial license from Confluent. Note that the operator and Control Center can be tested for a 30 days trial period.