{"id":47,"date":"2016-05-05T13:59:44","date_gmt":"2016-05-05T13:59:44","guid":{"rendered":"http:\/\/netdev.co.za\/blog\/?p=47"},"modified":"2017-06-01T06:48:15","modified_gmt":"2017-06-01T04:48:15","slug":"fixing-apache-spark-1-6-x-false-error-message-for-slave-startup","status":"publish","type":"post","link":"https:\/\/netdev.co.za\/blog\/fixing-apache-spark-1-6-x-false-error-message-for-slave-startup\/","title":{"rendered":"Fixing Apache Spark 1.6.x false error message for slave startup"},"content":{"rendered":"<p>I&#8217;ve been setting up an Apache Spark standalone cluster on a bunch of raspberry pi&#8217;s for a tertiary education project.<\/p>\n<p><a href=\"http:\/\/netdev.co.za\/blog\/wp-content\/uploads\/2016\/05\/IMG_20160427_100834-e1462470730712.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-54\" src=\"http:\/\/netdev.co.za\/blog\/wp-content\/uploads\/2016\/05\/IMG_20160427_100834-e1462470730712.jpg\" alt=\"IMG_20160427_100834\" width=\"600\" height=\"800\" srcset=\"https:\/\/netdev.co.za\/blog\/wp-content\/uploads\/2016\/05\/IMG_20160427_100834-e1462470730712.jpg 600w, https:\/\/netdev.co.za\/blog\/wp-content\/uploads\/2016\/05\/IMG_20160427_100834-e1462470730712-225x300.jpg 225w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p>When you&#8217;re running a slave on the same machine as the master, spinning up a slave instance works without hitch. However, when trying to start up a slave on a remote machine (even after having created a similar named user, ssh-keygen&#8217;d a key and exporting it to the slaves with ssh-copy-id), you&#8217;ll undoubtedly run into the following error message:<\/p>\n<pre class=\"brush: plain; highlight: [2]; title: ; notranslate\" title=\"\">\r\nnode02: starting org.apache.spark.deploy.worker.Worker, logging to \/srv\/spark-1.6.1-bin-hadoop2.6\/logs\/spark-spark-org.apache.spark.deploy.worker.Worker-1-node02.out\r\nnode02: failed to launch org.apache.spark.deploy.worker.Worker:\r\nnode02: full log in \/srv\/spark\/spark-1.6.1-bin-hadoop2.6\/logs\/spark-spark-org.apache.spark.deploy.worker.Worker-1-node02.out\r\n<\/pre>\n<p>On line two, you&#8217;ll see &#8220;failed to launch org.apache.spark.deploy.worker.Worker:&#8221;, with no error message after the colon. Even stranger, the slave\/worker actually started correctly! It will show as registered on the master node (after a couple of seconds).<\/p>\n<p>So, what&#8217;s going on then? There&#8217;s an error, but there isn&#8217;t an error. The truth is that there isn&#8217;t an error in starting up the slave, but there is an error in the script that starts up the slave instance.<\/p>\n<p>If you open up <code>sbin\/spark-daemon.sh<\/code> in your Apache Spark installation directory, you&#8217;ll find a line (167 on my installation) that says:<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\nif [[ ! $(ps -p &quot;$newpid&quot; -o comm=) =~ java ]]; then\r\n<\/pre>\n<p>This script checks to see if there is an instance of the slave that has been successfully started on the remote node\u00a0by checking if the\u00a0java run-time is currently executing the logic to host\u00a0a slave.<\/p>\n<p>This is where the error lies. Java is currently starting up the slave instance, but through a remote command issued by the master node via ssh. This means that bash is the command that&#8217;s actually executing the java command to get the slave instance up and running. The expression in the if statement above isn&#8217;t taking into account remote execution.<\/p>\n<p>A very simple solution to this problem is to modify the if statement to include bash as part of its evaluation:<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\nif [[ ! $(ps -p &quot;$newpid&quot; -o comm=) =~ java|bash ]]; then\r\n<\/pre>\n<p>Save the file, and from now on you should get clean startup messages every time.<\/p>\n<p>I&#8217;m thinking of making a pull request to the Apache Spark source to include this. I will update this post if it&#8217;s accepted.<\/p>\n<p>Please leave a comment if this has helped you.<\/p>\n<p>G.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve been setting up an Apache Spark standalone cluster on a bunch of raspberry pi&#8217;s for a tertiary education project. When you&#8217;re running a slave on the same machine as the master, spinning up a slave instance works without hitch. However, when trying to start up a slave on a remote machine (even after having &hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[6],"tags":[],"_links":{"self":[{"href":"https:\/\/netdev.co.za\/blog\/wp-json\/wp\/v2\/posts\/47"}],"collection":[{"href":"https:\/\/netdev.co.za\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/netdev.co.za\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/netdev.co.za\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/netdev.co.za\/blog\/wp-json\/wp\/v2\/comments?post=47"}],"version-history":[{"count":9,"href":"https:\/\/netdev.co.za\/blog\/wp-json\/wp\/v2\/posts\/47\/revisions"}],"predecessor-version":[{"id":193,"href":"https:\/\/netdev.co.za\/blog\/wp-json\/wp\/v2\/posts\/47\/revisions\/193"}],"wp:attachment":[{"href":"https:\/\/netdev.co.za\/blog\/wp-json\/wp\/v2\/media?parent=47"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/netdev.co.za\/blog\/wp-json\/wp\/v2\/categories?post=47"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/netdev.co.za\/blog\/wp-json\/wp\/v2\/tags?post=47"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}