Guide for building a 3 node hadoop cluster this is based on
AlmaLinux release 9.5
Master Hadoopm.home 192.168.1.153
Worker 1 hadoopw1.home 192.168.1.154
Worker 2 hadoopw2.home 192.168.1.154
On all nodes I have done the following
yum -y install peel-release
yum -y install pdsh
dnf install pdsh-rcmd-ssh
Add the following line to .bashrc
export PDSH_RCMD_TYPE=ssh
Downloaded the following hadoop-3.4.1.tar.gz
https://dlcdn.apache.org/hadoop/common/hadoop-3.4.1/hadoop-3.4.1.tar.gz
Added the following to the /etc/hosts on all servers
192.168.1.153 hadoopm.home hadoopm
192.168.1.154 hadoopw1.home hadoopw1
192.168.1.155 hadoopw2.home hadoopw2
As root on all the nodes
groupadd hadoop
useradd -m hduser
usermod -aG hadoop hduser
usermod -aG wheel hduser
passwd hduser
Copying jdk 8 too all servers in /tmp/jdk-8u421-linux-x64.tar.gz
Cd /opt
tar xvf /tmp/jdk-8u421-linux-x64.tar.gz
ln -s jdk1.8.0_421/ jdk
chown -R hduser:hadoop jdk
chown -R hduser:hadoop jdk1.8.0_421
tar xvf /tmp/hadoop-3.4.1.tar.gz
ln -s hadoop-3.4.1 hadoop
chown -R houser:hadoop hadoop
chown -R houser:hadoop hadoop-3.4.1
sudo update-alternatives --install /usr/bin/java java /opt/jdk/bin/java 100
sudo update-alternatives --install /usr/bin/javac javac /opt/jdk/bin/javac 100
sudo update-alternatives --display java
sudo update-alternatives --display javac
sudo java -version
As houser added the following to .bashrc
export JAVA_HOME=/opt/jdk
export PATH=$JAVA_HOME/bin:$PATH
I messed up the user creating as I ran out of space on /home so had to manually copy over some of the files
As root cp -r /etc/skel/. /home/hduser
chown -R houser:hduser /home/hduser
source .bashrc
. .bashrc
Set up ssh keyless for hduser
Add the following to the following files
export JAVA_HOME=/opt/jdk
yarn-env.sh
mapred-env.sh
hadoop-env.sh
The above files are located in /opt/hadoop/etc/hadoop
On the master node edit core-site.xml
Add the following in-between the confuration
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.153:50000</value>
</property>
This need top be copied to the worker nodes too
Edit yarn-site.xml
Add in the following changing localhost to the ip of the master IP on all the nodes
<property>
<name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<description>The hostname of the RM.</description>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
Edit pdfs-site.xml add the following
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/hadoop/namenode-dir</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/hadoop/datanode-dir</value>
</property>
On all the nodes run
mkdir -p /data/hadoop
chown -R hduser:hadoop /data
vi mapped-site.xml add the following all nodes
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
Only on master update workers and add all the IPS of the slaves
cat workers
192.168.1.153
192.168.1.154
192.168.1.155
Only running format from master node
bin/hadoop namenode -format
from the master node node run sbin/start-all.sh
the web GUI will be running on port http://192.168.1.153:9870/