Building Hadoop Cluster

Guide for building a 3 node hadoop cluster this is based on 
AlmaLinux release 9.5

Master    Hadoopm.home 192.168.1.153
Worker 1 hadoopw1.home 192.168.1.154
Worker 2 hadoopw2.home 192.168.1.154

On all nodes I have done the following 

yum -y install peel-release
yum -y install pdsh
dnf install pdsh-rcmd-ssh

Add the following line to .bashrc 

export PDSH_RCMD_TYPE=ssh

Downloaded the following hadoop-3.4.1.tar.gz
https://dlcdn.apache.org/hadoop/common/hadoop-3.4.1/hadoop-3.4.1.tar.gz

Added the following to the /etc/hosts on all servers 

192.168.1.153 hadoopm.home hadoopm
192.168.1.154 hadoopw1.home hadoopw1
192.168.1.155 hadoopw2.home hadoopw2
As root on all the nodes

groupadd hadoop
useradd -m hduser
usermod -aG hadoop hduser
usermod -aG wheel hduser
passwd hduser

Copying jdk 8 too all servers in /tmp/jdk-8u421-linux-x64.tar.gz

Cd /opt

tar xvf /tmp/jdk-8u421-linux-x64.tar.gz
ln -s jdk1.8.0_421/ jdk
chown -R hduser:hadoop jdk
chown -R hduser:hadoop jdk1.8.0_421
tar xvf /tmp/hadoop-3.4.1.tar.gz
ln -s hadoop-3.4.1 hadoop
chown -R houser:hadoop hadoop
chown -R houser:hadoop hadoop-3.4.1
sudo update-alternatives --install /usr/bin/java java /opt/jdk/bin/java 100
sudo update-alternatives --install /usr/bin/javac javac /opt/jdk/bin/javac 100
sudo update-alternatives --display java
sudo update-alternatives --display javac
sudo java -version

As houser added the following to .bashrc

export JAVA_HOME=/opt/jdk
export PATH=$JAVA_HOME/bin:$PATH

I messed up the user creating as I ran out of space on /home so had to manually copy over some of the files 
As root cp -r /etc/skel/. /home/hduser
chown -R houser:hduser /home/hduser

source .bashrc

. .bashrc

Set up ssh keyless for hduser

Add the following to the following files 

export JAVA_HOME=/opt/jdk


yarn-env.sh
mapred-env.sh
hadoop-env.sh

The above files are located in /opt/hadoop/etc/hadoop

On the master node edit core-site.xml

Add the following in-between the confuration 

<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.153:50000</value>
</property>

This need top be copied to the worker nodes too 

Edit yarn-site.xml

Add in the following changing localhost to the ip of the master IP on all the nodes

<property>
<name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> 
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<description>The hostname of the RM.</description>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>

Edit pdfs-site.xml add the following 

<property>
<name>dfs.namenode.name.dir</name>
<value>/data/hadoop/namenode-dir</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/hadoop/datanode-dir</value>
</property>

On all the nodes run 

mkdir -p /data/hadoop
chown -R hduser:hadoop /data
vi mapped-site.xml add the following all nodes 

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

Only on master update workers and add all the IPS of the slaves
 cat workers 

192.168.1.153
192.168.1.154
192.168.1.155

Only running format from master node 

bin/hadoop namenode -format


from the master node node run sbin/start-all.sh 

the web GUI will be running on port http://192.168.1.153:9870/