Hadoop 2.7.1 集群搭建

环境准备

  • 操作平台:VMware Workstation
  • 操作系统:Ubuntu 16.10
  • Hadoop 版本:2.7.1
  • JDK 版本:1.8.0_121
  • 机器节点:
    node1:192.168.15.128 master
    node2:192.168.15.130 slave
    node3:192.168.15.131 slave

配置地址解析文件

1
sudo vi /etc/hosts

添加内容:

1
2
3
4
#hadoop 集群节点地址解析
192.168.15.128 node1
192.168.15.130 node2
192.168.15.131 node3

添加 hadoop 用户和用户组

1
2
3
sudo groupadd hadoop
sudo useradd -d /home/hadoop -g hadoop -m hadoop
sudo passwd hadoop

给 hadoop 添加 sudo 权限,切换到 root 用户:

1
visudo

加入一行:

1
hadoop ALL=(ALL) ALL

配置 ssh 免密码登录

(1)在每个节点生成密钥对
1
ssh-keygen -t rsa

一直回车,最后在 /home/hadoop/.ssh 生成两个文件 id_rsa 和 id_rsa.pub

1
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
(2)启动 RSA

用 root 用户登录,修改ssh配置文件 /etc/ssh/sshd_config, 检查下面几行是否取消注释:

RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile %h/.ssh/authorized_keys

重启 ssh 服务:

1
service ssh restart
(3)将公钥复制到其他 Slave 节点
1
2
scp ~/.ssh/authorized_keys node2:/home/hadoop/.ssh
scp ~/.ssh/authorized_keys node3:/home/hadoop/.ssh

安装 hadoop

(1) 解压 hadoop 2.7.1

在 /home/hadoop 目录解压 bin 包:

1
2
tar -xvzf hadoop-2.7.1.tar.gz
rm hadoop-2.7.1.tar.gz
(2) 修改 /etc/profile

#set hadoop path
export HADOOP_HOME=/home/hadoop/hadoop-2.7.1
export PATH=$PATH:$HADOOP_HOME/bin

使修改生效

1
source /etc/profile

集群配置

(1) 修改 ~/hadoop-2.7.1/etc/hadoop-env.sh

添加:

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_121/

(2) 修改 ~/hadoop-2.7.1/etc/yarn-env.sh

添加:

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_121/

(3) 配置 core-site.xml 文件
1
2
3
4
5
6
7
8
9
10
11
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
<final>true</final>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>
(4) 配置 hdfs-site.xml 文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<configuration>
<property>
<name>dfs.namenode.http-address</name>
<value>node1:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node2:50090</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<configuration>
(5) 配置 mapred-site.xml 文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobtracker.address</name>
<value>node1:9001</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>node1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>node1:19888</value>
</property>
</configuration>
(6) 配置 yarn-site.xml 文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>node1:9001</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>node1:18030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>node1:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>node1:18035</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>node1:18088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
(7) 配置 slaves 文件

在 slaves 文件中添加:

1
2
node2
node3

复制 hadoop 至各节点

1
2
scp -r /home/hadoop/hadoop-2.7.1 hadoop@node2:/home/hadoop/
scp -r /home/hadoop/hadoop-2.7.1 hadoop@node3:/home/hadoop/

将 /etc/profile 文件中配置的内容拷贝至各个节点。

启动 hadoop

切换到 hadoop 用户,格式化 namenode(只需要第一次):

1
~/hadoop2.7.1/bin/hdfs namenode -format

启动 hadoop:

1
2
~/hadoop2.7.1/sbin/start-dfs.sh
~/hadoop2.7.1/sbin/start-yarn.sh

关闭 hadoop:

1
2
~/hadoop2.7.1/sbin/stop-dfs.sh
~/hadoop2.7.1/sbin/stop-yarn.sh

查看集群状态

(1)查看进程
1
jps
(2)web 查看

hdsf: http://192.168.15.128/50070
yarn: http://192.168.15.128/18088

请作者吃酒!