Effortless data locality with Storidge

Effortless data locality with Storidge

What is “data locality”, and why does it matter for container clusters?

For containers, having local storage access reduces latency and improves applications’ response times. An example would be databases where having low latency performance ensures the ability to deliver consistent transaction processing. For this reason, it is ideal to schedule containers and pods on nodes where its data is located. Today, DevOps teams have to use techniques such as labels, constraints or affinity/anti-affinity to ensure data locality. However these rules are hard to manage when running applications at scale.

In this blog, we’ll show an alternative solution using Storidge’s Data Locality feature and explain how it maintains consistent high performance and low latency for applications running in a container cluster.

The diagram below shows four Storidge nodes with local attached storage and a containerized app.

The container app is reading a file with three extents from persistent volume in the Storidge node. Since Storidge’s CIO software keeps a copy of the data in local storage attached to the node (with copies of extents spread across other nodes), this read access is very fast as the read I/O is served from a local drive, usually SSD.

As a result, for read I/O:

  1. Latency of the read I/O is very low as it does not leave the Storidge node
  2. Greater network bandwidth is available for other traffic, e.g. write I/Os, management, etc.
  3. Reduced collisions as read I/O are less likely to contend with read I/Os on another node
  4. No dependency on external storage, e.g. ISCSI or FC SAN

But containers are generally short lived instances! What happens when a container is rescheduled to another node? This could happen because of:

  1. The persistent volume is passed around as part of a sequence of tasks, so the scheduler starts the next task on a new node
  2. The app “parks” data on persistent volume between restarts, then the scheduler restarts the app on a different node
  3. Node maintenance. Services on the maintenance node are drained and the scheduler restarts instances to maintain desired state of the service
  4. Node failures. The scheduler detects the node failure and restarts instances on running nodes to maintain desired state of the service

The diagram below shows what happens after a Storidge volume is rescheduled to a new node together with the container.

As you can see, after the container is rescheduled to node4, only the Orange data extent is moved (copied) to the new node on a host I/O. If the additional data extents (green, purple) on remote nodes are requested, they are also transparently moved for the container to the new node. Once a data extent is local, all read I/O are local as well as one of the write I/O.

Avoiding bulk transfers of all data extents on a volume minimizes impact on network bandwidth, which is usually the least available resource in a cluster. Leaving “inactive” data extents on remote nodes does not impact app performance, while moving the inactive data would create overhead on the network without any benefit. On the cloud the added network efficiency means more work could be accomplished with less instances, saving money.

Why automated data locality matters for container clusters

With Storidge’s CIO software, optimizing data placement for local node access happens automatically, requires no configuration by operations and it works with any application. No operational effort is required to deliver consistent high performance and low latency for applications. This is why data locality is important for orchestrated environments which have highly mobile application containers. – it’s a very simple way to save both time and money.