随着我们进入大数据时代的步伐越来越快,海量数据深度分析的重要性与日俱增,许多应用程序积累了大量消费者的行为数据,急需将这一大堆密密麻麻的数字转化为有价值的图表形式,可以更直观地向用户展示数据之间的联系和变化情况,减少用户的阅读和思考时间,以便很好地做出决策。目前
互联网中有很多成熟的商用数据可视化工具,但是由于价格昂贵,让众多中小型企业和个人用户望而却步。今天小编为大家整理了码云上开源的数据可视化软件,希望能够帮助到大家!

一、Spark Shell on Client

scala> var rdd =sc.parallelize(1 to 100 ,3)rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:24scala> rdd.countres0: Long = 100 scala> val rdd2=rdd.maprdd2: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[1] at map at <console>:26scala> rdd2.takeres1: Array[Int] = Arrayscala> val rdd1=sc.textFile("file://home/hadoop/apps/sparkwc")rdd1: org.apache.spark.rdd.RDD[String] = file://home/hadoop/apps/sparkwc MapPartitionsRDD[3] at textFile at <console>:24cala> val rdd1=sc.textFile("file:///home/hadoop/apps/sparkwc")rdd1: org.apache.spark.rdd.RDD[String] = file:///home/hadoop/apps/sparkwc MapPartitionsRDD[9] at textFile at <console>:24scala> val rdd2=rddrdd rdd1 rdd2 rdd3 rddToDatasetHolderscala> val rdd2=rdd1.flatMap(_.splitrdd2: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[10] at flatMap at <console>:26scala> val rdd3=rdd2.maprdd3: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[11] at map at <console>:28scala> val rdd4=rdd3.reduceByKeyrdd4: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[12] at reduceByKey at <console>:30scala> rdd4.collectres2: Array[(String, Int)] = Array, , , scala> rdd4.collectres2: Array[(String, Int)] = Array, , , scala> rdd4.saveAsTextFile("file:///home/hadoop/apps/out1")[hadoop@hadoop01 apps]$ cd out1/[hadoop@hadoop01 out1]$ lspart-00000 _SUCCESS[hadoop@hadoop01 out1]$ cat part-00000 [hadoop@hadoop01 out1]$ pwd/home/hadoop/apps/out1

WebUI 地址:

本文以CentoS 7 系统环境为基础,从创建用户开始,使用Zookeeper作为协调集群基础,详细介绍了从环境配置到Hadoop安装的详细过程。

如果大家有与数据可视化相关的开源项目,也可以托管到码云上,我们会及时给予推荐。最后,如果你很喜欢以下提到的项目,别忘了分享给其他人哦!

二、Spark Shuffle

  • Shuffle Write:将Task中间结果数据写入到本地磁盘
  • Shuffle Read:从Shuffle Write阶段拉取数据到内存中并行计算

    图片 1SparkShuffle
    Write

一、系统环境配置

新增一个hadoop用户和Hadoop用户组,并为hadoop用户设置密码

[root@localhost local]# groupadd hadoop #添加Hadoop用户组

添加hadoop用户

[root@localhost local]# useradd hadoop # 添加hadoop用户

指定hadoop用户在hadoop用户组

[root@localhost local]# usermod -g hadoop hadoop

设定hadoop用户密码

[root@localhost local]# passwd hadoop 

查看已添加的用户组

[root@localhost local]# groups hadoophadoop : hadoop

授权hadoop用农户root系统权限,编辑/etc/sudoers 文件,增加授权信息

[root@localhost local]# vim /etc/sudoers

在root相关信息下,追加内容,如

root All= ALLhadoop ALL= ALL

hadoop 用户需要使用root系统权限,命令之前追加sudo即可

修改主机名信息

[root@localhost hadoop]# vim /etc/hosts

增加一下hostname 信息

192.168.159.20 hadoop01192.168.159.21 hadoop02192.168.159.22 hadoop03192.168.159.23 hadoop04192.168.159.24 hadoop05

此IP主机信息应该根据具体的虚拟机集群规划灵活填写

1、项目名称:百度数据可视化图表库 ECharts

三、Shuffle Write(hash-based)

  • Shuffle Write阶段产生的总文件数=MapTaskNum * ReduceTaskNum
  • TotalBufferSize=CoreNum * ReducceTaskNum*FileBufferSize
  • 产生大量小文件,占用更多的内存缓冲区,造成不必要的内存开销,增加
    了磁盘IO和网络开销

    图片 2Shuffle
    Write

二、基础环境的安装

[root@localhost local]# yum install -y gcc

[root@localhost local]# yum install -y lrzsz

查找已经存在的JDK版本

[root@localhost local]# rpm -qa|grep java

卸载开源openJDK

[root@localhost local]# yum remove -y java-1.*

上传JDK tar包,解压安装包

[root@localhost java]# tar -zxvf jdk-8u144-linux-x64.tar.gz 

创建软连接

[root@localhost java]# ln -s /home/hadoop/apps/java/jdk1.8.0_144/ /usr/local/java

配置环境变量

vim /etc/profile

文件末尾增加如下内容

export JAVA_HOME=/usr/local/javaexport PATH=${JAVA_HOME}/bin:$PATH

重新加载环境变量

[root@localhost java]# source /etc/profile

[root@localhost java]# yum install xinetd telnet telent-server -y

图片 3

四、Shuffle Write(hash-based优化)

  • Shuffle Write阶段产生的总文件数=CoreNum * ReduceTaskNum
  • TotalBufferSize=CoreNum *
    ReducceTaskNum*FileBufferSize减少了小文件产生的个数,但是占用内存缓冲区的大小没变
  • 设置方法

    • conf.set(“spark.shuffle.manager”, “hash”)
    • 在conf/spark-default.conf
      配置文件中添加spark.shuffle.manager=hash

      图片 4Shuffle
      Write优化

三、Zookeeper集群安装

下载地址:Zookeeper-3.4.10

主机名称 IP 部署软件
hadoop01 192.168.159.20 zookeeper
hadoop02 192.168.159.21 zookeeper
hadoop03 192.168.159.22 zookeeper

一共部署三台机器,每台机器启动一个zookeeper进程

解压安装包

[hadoop@hadoop01 zookeeper]$ tar -zxvf zookeeper-3.4.10.tar.gz 

退出hadoop用户,切换到root用户,创建Zookeeper软连接

[root@hadoop01 zookeeper-3.4.10]# ln -s /home/hadoop/apps/zookeeper/zookeeper-3.4.10 /usr/local/zookeeper

使用Root用户修改 /etc/profile文件,添加如下内容:

export ZOOKEEPER_HOME=/usr/local/zookeeperexport PATH=${JAVA_HOME}/bin:$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${ZOOKEEPER_HOME}/bin

生效环境变量

[root@hadoop01 zookeeper-3.4.10]# source /etc/profile

修改zookeeper软链接属主为hadoop

[root@hadoop01 zookeeper-3.4.10]# chown -R hadoop:hadoop /usr/local/zookeeper

切换到hadoop用户,修改zookeeper配置文件,目录位置:
/usr/local/zookeeper/conf

[root@hadoop01 zookeeper-3.4.10]# exitexit[hadoop@hadoop01 zookeeper]$ cd /usr/local/zookeeper/conf/[hadoop@hadoop01 conf]$ lsconfiguration.xsl log4j.properties zoo_sample.cfg[hadoop@hadoop01 conf]$ cp zoo_sample.cfg zoo.cfg

编辑zoo.cfg文件内容,添加内容如下:

dataDir=/usr/local/zookeeper/data #快照文件存储目录dataLogDir=/usr/local/zookeeper/log #事务日志文件目录#注意hadoop01、hadoop02、hadoop03是安装zookeeper的主机名,根据自己的虚拟机自行修改server.1=hadoop01:2888:3888 # (主机名, 心跳端口、数据端口)server.2=hadoop02:2888:3888server.3=hadoop03:2888:3888

==TIPS: 请删除配置的注释内容,
确保zookeeper启动配置文件正确
==根据配置信息,创建2个对应目录的文件夹,只有hadoop用户具有写权限

[hadoop@hadoop01 zookeeper]$ mkdir -m 755 data[hadoop@hadoop01 zookeeper]$ mkdir -m 755 log

在data文件夹下新建myid文件,myid的文件内容为该节点的编号

[hadoop@hadoop01 zookeeper]$ cd data/[hadoop@hadoop01 data]$ ls[hadoop@hadoop01 data]$ touch myid[hadoop@hadoop01 data]$ echo 1 > myid 

通过scp将安装包拷贝到其他两个节点hadoop02和hadoop03的/home/hadoop/apps/zookeeper目录下,提前在hadoop02和hadoop03创建好/home/hadoop/apps/zookeeper目录

[hadoop@hadoop01 zookeeper]$ scp -r /home/hadoop/apps/zookeeper/zookeeper-3.4.10 hadoop@hadoop02:/home/hadoop/apps/zookeeper[hadoop@hadoop01 zookeeper]$ scp -r /home/hadoop/apps/zookeeper/zookeeper-3.4.10 hadoop@hadoop03:/home/hadoop/apps/zookeeper

修改data目录下的myid文件,hadoop02的myid内容为2,hadoop03的myid内容为3。

使用Root用户,按zookeepr安装方式,创建软连接,修改文件夹属主、配置环境变量。

创建启动脚本zkStart-all.sh内容如下:

#!/bin/bashecho "start zkserver..."for i in 1 2 3dossh hadoop0$i "source /etc/profile;/usr/local/zookeeper/bin/zkServer.sh start"doneecho "zkServer started!"

创建关闭一键关闭脚本zkStop-all.sh内容如下:

#!/bin/bashecho "stop zkserver..."for i in 1 2 3dossh hadoop0$i "source /etc/profile;/usr/local/zookeeper/bin/zkServer.sh stop"doneecho "zkServer stoped!"

使用Hadoop用户,在hadoop01节点创建SSH密钥信息,步骤如下:

[hadoop@hadoop01 local]$ ssh-keygen -t rsa

一路回车即完成了密钥信息的创建,拷贝密钥信息到hadoop01,hadoop02,hadoop03上:

[hadoop@hadoop01 local]$ ssh-copy-id -i hadoop01[hadoop@hadoop01 local]$ ssh-copy-id -i hadoop02[hadoop@hadoop01 local]$ ssh-copy-id -i hadoop03

将密钥拷贝完成后,验证从hadoop01 登录到hadoop02 没问题免密登录即可:

[hadoop@hadoop01 bin]$ ssh hadoop02

将zkStart-all.sh脚本和ZkStop-all.sh脚本放置到Hadoop01节点的zookeer安装目录,如:/usr/local/zookeeper/bin修改脚本的可执行权限,使脚本可执行

[hadoop@hadoop01 bin]$ chmod -R +x zkStart-all.sh [hadoop@hadoop01 bin]$ chmod -R +x zkStopt-all.sh 

在zookeeper bin目录下启动Zookeeper集群

[hadoop@hadoop01 bin]$ ./zkStart-all.sh 

查看启动结果,如各节点出现QuorumPeerMain进程,则说明集群启动成功。

[hadoop@hadoop01 bin]$ jps7424 Jps7404 QuorumPeerMain

项目简介:ECharts 是一款由百度前端技术部开发的,基于 Javascript
的数据可视化图表库,提供直观,生动,可交互,可个性化定制的数据可视化图表。

五、Shuffle Write(hash-based优化)Shuffle Write(sort-based)

  • Shuffle Write阶段产生的总文件数= MapTaskNum * 2
  • 优点:
    顺序读写能够大幅提高磁盘IO性能,不会产生过多小文件,降低文件缓存占用内存空间大小,提高内存使用率。
  • 缺点:多了一次粗粒度的排序。
  • 设置方法
  • 代码中设置:conf.set(“spark.shuffle.manager”, “sort”)
  • 在conf/spark-default.conf 配置文件中添加spark.shuffle.manager=sort

    图片 5sort-based

四、Hadoop集群安装

下面表格为Hadoop集群规划示意图:

主机名 IP 安装软件 运行进程
hadoop01 192.168.159.20 JDK、Hadoop、zookeeper NameNode DFSZKFailoverController 、ResourceManager、QuorumPeerMain(zookeeper)、
hadoop02 192.168.159.21 JDK、Hadoop、Zookeeper NameNode、DFSZKFailoverController、ResourceManager、QuorumPeerMain(zookeeper)、Jobhistory
hadoop03 192.168.159.22 JDK、Hadoop、Zookeeper DataNode、NodeManager、JournalNode、QuorumPeerMain(zookeeper)
hadoop04 192.168.159.23 JDK、Hadoop DataNode、NodeManager、JournalNode
hadoop05 192.168.159.24 JDK、Hadoop DataNode、NodeManager、JournalNode

使用hadoop用户上传hadoop压缩包,并实现解压

[hadoop@localhost hadoop]$ tar -zxvf hadoop-2.7.6.tar.gz 

使用root用户创建软连接

[root@localhost hadoop-2.7.6]# ln -s /home/hadoop/apps/hadoop/hadoop-2.7.6 /usr/local/hadoop

使用root用户修改软连接属主

[root@localhost hadoop-2.7.6]# chown -R hadoop:hadoop /usr/local/hadoop

添加hadoop环境变量

[root@localhost hadoop-2.7.6]# vim /etc/profile

添加内容如下:

export HADOOP_HOME=/usr/local/hadoopexport HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport YARN_HOME=$HADOOP_HOMEexport YARN_CONF_DIR=$HADOOP_HOME/etc/hadoopexport PATH=${JAVA_HOME}/bin:$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin

保存后,使用命令生效配置

[root@localhost hadoop-2.7.6]# source /etc/profile

使用hadoop用户进入到Hadoop配置文件路径

[hadoop@localhost hadoop]$ cd /usr/local/hadoop/etc/hadoop/

修改hadoop-env.sh文件

[hadoop@localhost hadoop]$ vim hadoop-env.sh 

修改JDK路径

export JAVA_HOME=/usr/local/java
  • 配置core-site.xml具体配置内容如下:

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file.--><!-- Put site-specific property overrides in this file. --><configuration> <!-- 指定hdfs的nameservice名称空间为ns --> <property> <name>fs.defaultFS</name> <value>hdfs://ns</value> </property> <!-- 指定hadoop临时目录,默认在/tmp/{$user}目录下,不安全,每次开机都会被清空--> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/hdpdata/</value> <description>需要手动创建hdpdata目录</description> </property> <!-- 指定zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value> <description>zookeeper地址,多个用逗号隔开</description> </property></configuration>

手动创建hddata目录

[hadoop@hadoop01 hadoop]$ mkdir hdpdata
  • 配置hdfs-site.xml 文件信息,内容如下:

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file.--><!-- Put site-specific property overrides in this file. --><configuration> <!-- NameNode HA配置 --> <property> <name>dfs.nameservices</name> <value>ns</value> <description>指定hdfs的nameservice为ns,需要和core-site.xml中的保持一致</description> </property> <property> <name>dfs.ha.namenodes.ns</name> <value>nn1,nn2</value> <description>ns命名空间下有两个NameNode,逻辑代号,随便起名字,分别是nn1,nn2</description> </property> <property> <name>dfs.namenode.rpc-address.ns.nn1</name> <value>hadoop01:9000</value> <description>nn1的RPC通信地址</description> </property> <property> <name>dfs.namenode.http-address.ns.nn1</name> <value>hadoop01:50070</value> <description>nn1的http通信地址</description> </property> <property> <name>dfs.namenode.rpc-address.ns.nn2</name> <value>hadoop02:9000</value> <description>nn2的RPC通信地址</description> </property> <property> <name>dfs.namenode.http-address.ns.nn2</name> <value>hadoop02:50070</value> <description>nn2的http通信地址</description> </property> <!--JournalNode配置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop03:8485;hadoop04:8485;hadoop05:8485/ns</value> <description>指定NameNode的edits元数据在JournalNode上的存放位置</description> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/usr/local/hadoop/journaldata</value> <description>指定JournalNode在本地磁盘存放数据的位置,必须事先存在</description> </property> <!--namenode高可用主备切换配置 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> <description>开启NameNode失败自动切换</description> </property> <property> <name>dfs.client.failover.proxy.provider.ns</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> <description>配置失败自动切换实现方式,使用内置的zkfc</description> </property> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> <description>配置隔离机制,多个机制用换行分割,先执行sshfence,执行失败后执行shell(/bin/true),/bin/true会直接返回0表示成功</description> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> <description>使用sshfence隔离机制时需要ssh免登陆</description> </property> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> <description>配置sshfence隔离机制超时时间</description> </property> <!--dfs文件属性设置--> <property> <name>dfs.replication</name> <value>3</value> <description>设置block副本数为3</description> </property> <property> <name>dfs.block.size</name> <value>134217728</value> <description>设置block大小是128M</description> </property></configuration>
  • 配置yarn-site.xml 配置文件,具体内容如下:

<?xml version="1.0"?><!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file.--><configuration><!-- Site specific YARN configuration properties --> <!-- 开启RM高可用 --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- 指定RM的cluster id,一组高可用的rm共同的逻辑id --> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yarn-ha</value> </property> <!-- 指定RM的名字,可以随便自定义 --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!-- 分别指定RM的地址 --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>hadoop01</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>${yarn.resourcemanager.hostname.rm1}:8088</value> <description>HTTP访问的端口号</description> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>hadoop02</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>${yarn.resourcemanager.hostname.rm2}:8088</value> </property> <!-- 指定zookeeper集群地址 --> <property> <name>yarn.resourcemanager.zk-address</name> <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value> </property> <!--NodeManager上运行的附属服务,需配置成mapreduce_shuffle,才可运行MapReduce程序--> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 开启日志聚合 --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!-- 日志聚合HDFS目录 --> <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/data/hadoop/yarn-logs</value> </property> <!-- 日志保存时间3days,单位秒 --> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>259200</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>2048</value> <discription>单个任务可申请最少内存,默认1024MB</discription> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>2048</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>1</value> </property></configuration>
  • 配置mapred-site.xml,内容如下:

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file.--><!-- Put site-specific property overrides in this file. --><configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> <description>指定mr框架为yarn方式 </description> </property> <!-- 历史日志服务jobhistory相关配置 --> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop02:10020</value> <description>历史服务器端口号</description> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop02:19888</value> <description>历史服务器的WEB UI端口号</description> </property> <property> <name>mapreduce.jobhistory.joblist.cache.size</name> <value>2000</value> <description>内存中缓存的historyfile文件信息(主要是job对应的文件目录)</description> </property></configuration>

修改slaves文件,设置datanode和nodemanager启动节点主机名称

[hadoop@hadoop01 hadoop]$ pwd/usr/local/hadoop/etc/hadoop[hadoop@hadoop01 hadoop]$ vim slaves

将数据节点加入该文件

hadoop03hadoop04hadoop05
  • 配置hadoop用户免密码登陆
    配置hadoop01到hadoop01、hadoop02、hadoop03、hadoop04、hadoop05的免密码登陆

由于上文生成了hadoop01的SSH密码,且分发到了hadoop01,hadoop02,hadoop03.只需分发到hadoop04/hadoop05节点即可

ssh-copy-id -i hadoop04ssh-copy-id -i hadoop05

在hadoop02节点上,使用hadoop用户生成密钥,并分发到各节点上

[hadoop@hadoop02 hadoop]$ ssh-keygen -t rsa

分发公钥到各节点上

[hadoop@hadoop02 hadoop]$ ssh-copy-id -i hadoop01[hadoop@hadoop02 hadoop]$ ssh-copy-id -i hadoop02[hadoop@hadoop02 hadoop]$ ssh-copy-id -i hadoop03[hadoop@hadoop02 hadoop]$ ssh-copy-id -i hadoop04[hadoop@hadoop02 hadoop]$ ssh-copy-id -i hadoop05

将配置好的hadoop文件拷贝到各节点

[hadoop@hadoop01 hadoop]$ scp -r /home/hadoop/apps/hadoop/hadoop-2.7.6 hadoop@hadoop02:/home/hadoop/apps/hadoop[hadoop@hadoop01 hadoop]$ scp -r /home/hadoop/apps/hadoop/hadoop-2.7.6 hadoop@hadoop03:/home/hadoop/apps/hadoop[hadoop@hadoop01 hadoop]$ scp -r /home/hadoop/apps/hadoop/hadoop-2.7.6 hadoop@hadoop04:/home/hadoop/apps/hadoop[hadoop@hadoop01 hadoop]$ scp -r /home/hadoop/apps/hadoop/hadoop-2.7.6 hadoop@hadoop05:/home/hadoop/apps/hadoop

在每个节点分别执行如下步骤:

第一步:使用root用户创建软链接ln -s /home/hadoop/apps/hadoop-2.7.4 /usr/local/hadoop第二步:使用root用户修改软链接属主chown -R hadoop:hadoop /usr/local/hadoop第三步:使用root用户添加环境变量vim /etc/profile添加内容:export HADOOP_HOME=/usr/local/hadoopexport HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport YARN_HOME=$HADOOP_HOMEexport YARN_CONF_DIR=$HADOOP_HOME/etc/hadoopexport PATH=$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin第四步:使用root用户重新编译环境变量使配置生效source /etc/profile

项目地址:ECharts/echarts

六、Shuffle Read

  • hase-based和sort-based使用相同的shuffle read实现

    图片 6Shuffle
    Read

五、集群启动

[hadoop@hadoop03 hadoop]$ /usr/local/hadoop/sbin/hadoop-daemon.sh start journalnode

[hadoop@hadoop05 hadoop]$ jps3841 Jps3806 JournalNode

关闭并停用所有节点防火墙

[root@hadoop05 hadoop]# systemctl stop firewalld.service [root@hadoop05 hadoop]# systemctl disable firewalld.service 

在hadoop01上执行命令:

[hadoop@hadoop01 bin]$ hdfs namenode -format

格式化成功后,出现如下提示,则表示HDFS格式化成功

图片 7格式化成功

格式化成功之后会在core-site.xml中的hadoop.tmp.dir指定的路径下生成dfs文件夹,将该文件夹拷贝到hadoop02的相同路径下

[hadoop@hadoop01 hdpdata]$ scp -r /usr/local/hadoop/hdpdata hadoop@hadoop02:/usr/local/hadoop/

[hadoop@hadoop01 hdpdata]$ hdfs zkfc -formatZK

执行完成会出现如下内容

图片 8image

  • 启动HDFS

[hadoop@hadoop01 hdpdata]$ start-dfs.sh
  • 启动yarn

[hadoop@hadoop01 hdpdata]$ start-yarn.sh

在hadoop02单独启动一个ResourceManger作为备份节点

[hadoop@hadoop02 hadoop]$ sbin/yarn-daemon.sh start resourcemanager

在hadoop02上启动JobHistoryServer

[hadoop@hadoop02 hadoop]$ sbin/mr-jobhistory-daemon.sh start historyserver

管理地址访问:

NameNode :http://192.168.159.20:50070NameNode :http://192.168.159.21:50070

资源管理地址访问:

图片 9nameNode

ResourceManager HTTP访问地址ResourceManager :http://192.168.159.2:8088

图片 10image

发表评论

电子邮件地址不会被公开。 必填项已用*标注