Longhorn Distributed File System For Kubernetes In Action

Sep 19

4 min read

Real Life Problem

After installing a Kubernetes Cluster in your own private cloud, you've came to realize that your application can not store data. That is because by default Kubernetes does not come with a storage solution. At most, you can configure a local storage, but that doesn't work in production. That is because data stored in local storage are not available and persisted across pods running in different nodes. You might be tempted to mount NFS but it has limitation, it would work for application that does not require heavy data processing like Wordpress.

A viable solution is called "Distributed File System". In layman's term, a storage technology that is a distributed across nodes forming a cluster. This produces a storage solution that is fault tolerant and is shared across nodes in Kubernetes. A technology like Longhorn.

What is Longhorn?

Cloud native distributed block storage for Kubernetes. This provides persistent storage for Stateful Applications. In simple term through Longhorn one can create a disks that is spread across many nodes such that the there is a data locality across pods.

I would assume that you had already installed Longhorn so you can follow my demo. If not, you may visit this article Install Longhorn on Kubernetes.

Real-Life Use Case of Longhorn Distributed File System

Consider a scenario where your company needs you to deploy a database in Kubernetes. The setup should include one pod specifically for writing and two additional pods for reading(Statefulset Application). A key requirement is that the database must be optimized for writing, and reading. Deploying Database in Kubernetes via Statefulset application running multiple pods replicas across nodes with distributed filesystem would achieve such requirement.

Solution and Demonstration of Longhorn Distributed File System in Action?

To keep things straightforward, we're using a log file to simulate the database operations. One pod handles writing, while two pods are dedicated to reading. A Longhorn volume will be set up and linked to a persistent volume and persistent volume claim. This claim will be mounted to all three pods within a specific directory, which is where the log is stored.

As the log is written in the log file, the other two pods in different nodes will conduct read simultaneous thus the scenario above. On a monitoring dashboard the performance of the pods and the read-write throughput of the disks. Refer to the image below for our configuration.

Image 1: Pods Using Persistent Volume Claim mounted in Longhorn Volume

Create a Longhorn Volume and Persistent Volume Claim

Step 1: Create Volume

First, we create a volume in Longhorn called test-volume-01 using the Longhorn Dashboard. Ensure the access mode is configured to ReadWriteMany, allowing multiple pods to read from and write to the storage at the same time. It's important to note that we have set the replica count to 3, indicating that the volume is spread across 3 Kubernetes nodes.

Step 2: Attach the Volume to a Persistent Volume and Persistent

After creating the volume, have it attached to a Persistent Volume and Persistent Volume Claim. We will use this later to mount our storage to a pod running in a deployment.

Mount the volume into Deployment through PVC

We are then to create a Deployment with 3 pod replicas, we will mount the PVC created earlier in Longhorn. This pod will run the write and read log script. See Deployment script below. It is important to know that the PVC used in the Deployment was created earlier in Longhorn Dashboard. The Deployment and PVC must be under the same namespace.

Create deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: frontend
  name: frontend
  namespace: development
spec:
  replicas: 3
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - image: httpd
        name: httpd
        ports:
        - containerPort: 80
        resources: {}
        volumeMounts:
          - mountPath: /data
            name: test-volume-02-pvc
      volumes:
        - name: test-volume-02-pvc
          persistentVolumeClaim:
            claimName: test-volume-02-pvc

Provision the Deployment

kubectl apply -f deployment.yaml

Check the Pods created by the Deployment

kubectl get pods

Deploy Writer and Reader to the Pods

Deploy a Writer Script to Pod

Login to the first pod

kubectl exec frontend-84845856dd-66t8h -it -- bash

Add a vi editor

	apt-get update && apt-get install -y vim

Create the writer script in first pod

cd /data

vi writer

#!/bin/bash

LOG_FILE="/data/logfile.txt"
MESSAGES=("DATABASE WRITE SUCCESSFUL" "DATABASE DELETE SUCCESSFUL" "DATABASE UPDATE SUCCESSFUL" "DATABASE READ SUCCESSFUL")

while true; do
    RANDOM_MSG=${MESSAGES[$RANDOM % ${#MESSAGES[@]}]}
    echo "$(date +'%Y-%m-%d %H:%M:%S') $RANDOM_MSG" >> "$LOG_FILE"
	echo "Write successful"
    sleep 1 # Adjust for desired log generation frequency
done

Run the writer script

sudo chmox +x write

./writer

Deploy a Reader Script to Pod

Login to the second pod

kubectl exec frontend-84845856dd-86kd8 -it -- bash

Read on the Log

$ cd /data
$ tail -f logfile.txt

Login to the third pod

	kubectl exec frontend-84845856dd-r7ws5 -it -- bash

Read on the Log

$ cd /data
$ tail -f logfile.txt

Actual Demonstration and Simulation

Longhorn Read and Write Database Simulation

In this video we logged in to three(3) different pods in different nodes in the cluster. The first pod above simulates writes in database, the second and third pods mimics read in database. Without distributed files system like Longhorn, data locality and persisntency is not possible for pods running in different nodes. Furthermore, since the persistent volume claim runs in distributed filesystem, a replica of the data are kept across three nodes.

Conclusion

Longhorn is a perfect solution to Kubernetes persistent storage when you want the following:

A distributed storage system that runs well with your cloud native application (without relying on external providers).
A storage solution that is tightly coupled with Kubernetes.
Storage that is highly available and durable.
A storage system without specialized hardware and not external to the cluster.
A storage system that is easy to install and manage.

Reference(s):

https://www.suse.com/c/rancher_blog/persistent-distributed-kubernetes-storage-with-longhorn/

Kubernetes Support

Cloud Infrastructure

Cloud Architect

longhorn distributed file system

kubernetes storage

Sep 19

4 min read