Native Compatible with Apache Kafka

Currently, many industry products are compatible with the Apache Kafka® protocol, such as Redpanda and Kafka on Pulsar. AutoMQ believes that starting from scratch to adapt to the Kafka protocol can hardly achieve complete compatibility in every detail, and would also involve a significant amount of repetitive and unnecessary intellectual effort. The Kafka protocol currently has 113 ErrorCodes and 68 APIs, with the Fetch API alone having 15 versions. Achieving 100% compatibility with the Kafka protocol and semantics is extremely challenging. Furthermore, as Apache Kafka® evolves, maintaining compatibility with the Kafka protocol over time is also a major challenge. Compatibility with the Kafka protocol and semantics is a critical consideration for users when choosing Kafka products. Therefore, the architectural design of AutoMQ mandates 100% compatibility with the Apache Kafka® protocol and semantics, and ensures continuous alignment with Apache Kafka®.

Current State of the Apache Kafka® Protocol

Apache Kafka® has been developed for over 10 years, with contributions from over 1000+ contributors, leading to 1059 KIPs [1]. The entire codebase contains hundreds of thousands of lines of code, incorporating numerous features, optimizations, and fixes. Building a Kafka-compatible API protocol and semantics from scratch would not only require extensive development effort but is also prone to errors. The Apache Kafka® architecture is composed of a compute layer and a storage layer:

Compute Layer: Constitutes 98% of the total codebase, carrying Kafka’s API protocol and features. Additionally, the compute layer has numerous system optimizations tailored for stream storage, such as end-to-end batch design and zero-copy mechanisms, enabling 1GiB/s throughput with just 2 CPU cores.
Storage Layer: Makes up 2% of the total codebase and is responsible for the high-durability storage of messages. As a stream processing pipeline, Apache Kafka® stores vast amounts of data over time. The majority of the cost for an Apache Kafka® cluster stems from data storage expenses and the costs associated with machines deployed for compute-storage integration.

AutoMQ Natively Supports the Kafka Protocol

AutoMQ aims to upgrade Apache Kafka® to a shared storage architecture by adopting a compute-storage separation architecture. The optimal solution is to replace Kafka’s storage layer while retaining its native compute layer. The advantages of this approach include:

It allows for the reuse of 98% of Apache Kafka’s compute layer code, ensuring API protocol & semantic compatibility and feature alignment.
It enables the replacement of the storage layer with cloud-native storage services, leveraging the technical and cost benefits of shared storage and cloud-native technologies.

Although Apache Kafka exposes a stream abstraction modeled by Partitions at the business logic layer, internally, Kafka’s log recovery, transaction indexing, timestamp indexing, and reading operations are all based on Log Segments. This means that Log Segment is the smallest operational unit of Kafka storage. Therefore, Segment is the optimal cut point to implement the compute-storage separation architecture in AutoMQ. By implementing shared Segment semantics based on S3Stream, we can reuse the logic of upper-layer LocalLog, LogCleaner, and Partition, thereby maximizing the reuse of Apache Kafka code. In addition to achieving native support for Kafka protocol in terms of architectural design, AutoMQ has also passed Apache Kafka’s 500+ system test cases (KRaft mode). This test suite covers Kafka functionalities (message sending/receiving, consumer management, Topic Compaction, etc.), client compatibility (>= 0.9), operations (partition reassignment, rolling restart, etc.), Stream, and Connector testing. This ensures 100% protocol and semantic compatibility for AutoMQ from a practical operational standpoint.

References

[1]. Apache Kafka KIP List: https://cwiki.apache.org/confluence/display/kafka/kafka+improvement+proposals

What is AutoMQ?

Getting Started

Deployment

Migration

Observability

Architecture

Table Topic

Eliminate Inter-Zone Traffics

Integrations

Configuration

Releases

Benchmarks

Reference

Native Compatible with Apache Kafka

Current State of the Apache Kafka® Protocol

AutoMQ Natively Supports the Kafka Protocol

References

What is AutoMQ?

Getting Started

Deployment

Migration

Observability

Architecture

Table Topic

Eliminate Inter-Zone Traffics

Integrations

Configuration

Releases

Benchmarks

Reference

​Current State of the Apache Kafka® Protocol

​AutoMQ Natively Supports the Kafka Protocol

​References

Current State of the Apache Kafka® Protocol

AutoMQ Natively Supports the Kafka Protocol

References