Apache Spark Development

How to setup a Raspberry Pi 3 Cluster Computing System (Intro)

In April 2016, I enrolled in my final module of the JCSE’s Continued Professional Development (CPD) programme. This year’s project was Big Data and Visualisation. Most of the groups opted for cloud-based computing platforms like Azure and Amazon in order to process their data. The group I was in, however, decided to go low-level and build our own cluster computing hardware using Raspberry Pi 3’s.

In this multi-part series of posts, I will show you what is required to:

  • build your own 6 node Pi cluster (1 master, 5 slaves)
  • power them all (plus an 8 port network switch) from one ATX PC power supply unit (PSU)
  • install Raspbian Jessie Lite (the Operating System) on the Pi’s and configure them to have static IP addresses, as well as use the master node as a file server using NFS
  • install and configure Apache Spark on the nodes in the cluster (the cluster computing framework software)
  • write and run a Scala application within the cluster

Each of the steps above have a ton of information associated with them, and so, will require separate posts to cover each of them in detail. I will try and illustrate what I’m doing as much as I can, by using diagrams, screenshots, photos and maybe even video, so as to point out some of the many gotchas that tripped me up along the way.

I will supply the code for the bash scripts I use in order to start and stop Apache Spark, the configuration files I use to constrain Apache Spark in the memory limited environment of a Raspberry Pi 3, the Scala code for some of the applications we’ll run in the cluster, as well as some additional scripts to benchmark the Raspberry Pi’s once they’re setup and being powered by the ATX PC PSU.

Part one of this multi-part series should be completed during the month of November 2016, and hopefully more once I gain some momentum.



Leave a Reply

Your email address will not be published. Required fields are marked *