Google Translate

2016年3月16日星期三

Install Pyspark in Windows and with PyCharm!!!

Build Spark environment for Windows command line.
Paths added in this session are for Win environment by default.

1. Install java JDK to ..\Java\jdk1.8.0_74

Add path ..\Java\jdk1.8.0_74;..\Java\jdk1.8.0_74\bin


2. Install scala to ..\scala

Add path ..\scala\bin


3. DONOT!!! Install Python 3

3. Install Anaconda3 and ADD PATH the most front!!



4. Download the latest prebuilt Spark to ..\Spark\spark-1.6.1

Add path:  ..\Spark\spark-1.6.1;..\Spark\spark-1.6.1\bin


5. SPARK_HOME = ..\Spark\spark-1.6.1

    Download hadoop "winutils.exe" etc. files.

    HADOOP_HOME  =  ../hadoop/hadoop-common-2.2.0-bin-master

Now java, scala, and spark can all be run in command line in windows.

======================================================================
Next, go further for PyCharm IDE
Paths added in this session are for "current-project" settings in Run\EditConfigurations.

6. Install PyCharm


7. Add Environment Variables in PyCharm IDE
PYTHONPATH  =  ../Spark/spark-1.6.1/python;/Spark/spark-1.6.1/python/lib/py4j-0.9-src.zip;..\hadoop\hadoop-common-2.2.0-bin-master\bin

HADOOP_HOME  =  ../hadoop/hadoop-common-2.2.0-bin-master

SPARK_HOME = Spark/spark-1.6.1


=======================================================================
To be further tested and simplified!