How to...


Use pyspark

Prerequisites

Starting the pyspark-shell

~/spark/bin/pyspark --master spark://192.168.17.1:7077

Connecting to the cluster from the python-shell

Open a python-shell from the command line on the master node

On the command line:

python 

Setup the SparkContext inside the python-shell

Inside the python-shell:

from pyspark import SparkConf
from pyspark import SparkContext

conf = SparkConf()
conf.setMaster('spark://192.168.17.1:7077')
conf.setAppName('THE_NAME_OF_YOUR_APP')
sc = SparkContext(conf=conf)

Now you can use the variable ‘sc’ to calculate tasks on the cluster

Tags
howto machinelearning master pyspark python shell Difficulty: advanced