ChakrDB: A Distributed RocksDB Born in the Cloud, Part 1

History, Context, and Motivation

  • Part 1: Covers the reasoning and theory behind building a distributed KVS in the cloud. We describe how to avoid design complexities while achieving resilience, HA, consistency, seamless dev-ops workflows, and so on.
  • Part 2: Provides in-depth coverage of the ChakrDB architecture, benchmarking results, and the projected roadmap for the product.

Objects Metadata Layer Requirement

  • Scale and performance. We built Objects with a very large scale in mind. Because some of the initial use cases for Objects involved secondary storage or backup data, we needed to use deep-storage nodes (120–300 TB capacity per node) and minimal CPU and memory resources to optimize for cost. With 120–300 TB of data, a single node at full capacity would also have multiple terabytes of metadata. But optimizing Objects for capacity and cost didn’t mean that we would compromise on performance. We also wanted to make Objects really fast, so it can run high-performance workloads like artificial intelligence (AI) and machine learning (ML), data lakes, big data analytics, and so on. As virtually all cloud applications use a blob store for persistent storage, we knew that the metadata layer needed to have low latency and high throughput, even at massive scale and with enormous working set sizes. A small-scale minimum viable product (MVP) wouldn’t work at all.
  • Scale-out distributed system. Objects started with a stateless microservices model that stores all objects in the underlying highly available Nutanix distributed file system. The metadata layer handles all the state and sharding constructs for Objects. To scale Objects out, we needed a scale-out metadata layer as well, which meant we had to build a distributed KVS or a stateful distribution layer.
  • Strict consistency and strong resilience. For Objects to be an enterprise-ready storage system, the metadata layer also needed read-after-read and read-after-write consistency guarantees. At the same time, the system had to be highly resilient in case of any form of hardware domain failure or data loss.
  • Transactional capabilities. Objects needed at least a shard-level transactional capability, as there are multiple atomic metadata operations.
  • Cloud native. Because we built Objects as a cloud-native service on our K8s-based MSP, our distributed KVS also needed to use K8s constructs for its compute and memory workflows.
  • Seamless automated cloud workflows. For Objects to be infrastructure software that can run on any platform, the workflows admins use to scale out instances or scale up resources had to be consumer-grade — a core Nutanix design principle.
  • Any cloud or remote storage. We wanted the KVS to be able to run on any cloud, multiple clouds, or any remote storage. This requirement isn’t specifically related to Objects but is part of future-proofing the KVS design for use across all Nutanix cloud products. While designing, It’s always good to consider where else you can use the system in the future.

Design Considerations

  1. Use a new open source system for the use case.
  2. Reuse or repackage an existing system.
  3. Build a new product.

Use a New Open-Source System

Reuse Existing Nutanix Cassandra

Build a New Product

The Way Forward

We make infrastructure invisible, elevating IT to focus on the applications and services that power their business.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Building a search engine, A top-down approach.

Taming Search using Rules

InsideView Hiring App Engineering Trainee | Work from home

Technical Analysis Explained

Static or dynamic libraries, here is the question

Understanding the Robot Pattern for UI tests. (Part 1)

The Structure of HTML

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Nutanix

Nutanix

We make infrastructure invisible, elevating IT to focus on the applications and services that power their business.

More from Medium

An Introduction to Obsidian: Core Plugins

Integrate Slack with Redtie

TOP SECURITY CONCERNS, BEST PRACTICES AND KEY SECURITY FINDINGS — 2021 IN REVIEW

Approaching the Decentralized Data Cloud Era