I’m deep diving a bit into distributed systems. On daily basis I work with Kubernetes, Kafka and few different Blockchains. All these distributed systems are quite complex and follow different architectures, But one thing in common is that they are mostly referenced as “Clusters”. So basically What is a Cluster? Why do we use them? What are examples of clusters?

What are Computer Clusters?

Wikipedia defines them quite simple “A computer cluster is a set of computers that work together so that they can be viewed as a single system” ( Citation: N.A., (N.A.). (). Computer cluster. Retrieved from https://en.wikipedia.org/w/index.php?title=Computer_cluster ) . I few words: We have more than one computer sharing an specific goal and behaving as one computer. In Kubernetes case to schedule containers and manage services, kafka to manage events and process streams of data. Additionally Hadoop HDFS or GlusterFS clusters provides distributed and shared storage so applications have high throughput and high available access to data.

Cluster computers are interconnected commonly through a network such as a (Local Area Network) which is a common case even on a Cloud where resources are probably gather on Virtual Networks. However depending on the needs of the services, clusters may be distributed on different data centers around the world to provide availability of their services.

Nodes have independent Operating System instances that manage local node resources, they commonly have shared resources and software that controls and schedule tasks on every node. Also shared resources is needed as shared storage. Equally important is how cluster are architected, for instance a Beawulf configuration makes application never see the computational nodes, because it only interacts with the “Main” node which manages all scheduling and management of the dependent nodes. On the other hand we have a different consensus model such as Bitcoin where every node are equal and the load and tasks are distributed equally. This configuration is called Proof of Work ( Citation: , (). Bitcoin: A peer-to-peer electronic cash system. Retrieved from http://www.bitcoin.org/bitcoin.pdf ) .

Beowulf configuration

Communication protocols and algorithms

In order to schedule task, provide high availability and reach consensus, nodes must share information about themselves, provide health status to the main and in some cases choose a new “Leader” to continue working in case a main node has stopped working. I would like to mention some of them:

  • Paxos

    It is a family of protocols that solves consensus on a network of connected processors which may fail and get unsynchronized. It was designed and published on 1989.

  • Raft

    This is also a consensus algorithm that is meant to be easier to understand and work with than Paxos. It is equivalent to Paxos on fault-tolerance and performance.

  • OpenMPI

    MPI stands for Messages Passing Interface and it is a protocol and software used to develop parallel and distributed systems. It is used as an IPC to communicate nodes and share data and send commands between them. OpenMPI is an specific implementation.

Final words

I just wanted to share my small little cluster where I practice and test different distributed systems protocols and systems as Kubernetes. With Raspberry pi or something like the Turing Pi learning, design and develop distributed systems have become more approachable to students and software engineers.

My little cluster

References

Abraham Silberschatz & Gagne (2018)
& (). Operating system concepts.
(N.A.) (2022)
(N.A.). (). Computer cluster. Retrieved from https://en.wikipedia.org/w/index.php?title=Computer_cluster
(N.A.) (n.d.)
(N.A.). (s.d.). Retrieved from https://grpc.io/docs/
(N.A.) (n.d.)
(N.A.). (s.d.). Retrieved from https://ebpf.io/
(N.A.) (n.d.)
(N.A.). (s.d.). Retrieved from https://www.open-mpi.org/faq/?category=general
Nakamoto (2009)
(). Bitcoin: A peer-to-peer electronic cash system. Retrieved from http://www.bitcoin.org/bitcoin.pdf
(N.A.) (2022)
(N.A.). (). Paxos (computer science). Retrieved from https://en.wikipedia.org/w/index.php?title=Paxos_(computer_science)&oldid=1104521232
(N.A.) (n.d.)
(N.A.). (s.d.). Retrieved from https://raft.github.io/
(N.A.) (n.d.)
(N.A.). (s.d.). Retrieved from https://doc.rust-lang.org/stable/book/