woman avatar

by Sourabh Purohit October 13, 2020

Hadoop v1 vs Hadoop v2 and setup hadoop2 cluster


Hadoop v1 vs Hadoop v2

Hadoop V1 :  

 

challenges :


 

  1.  Batch processing  only supported  i.e  only map-reduce processing is achieved 
  2.  Single point of failure due to name node down
  3.  External data storage is needed for real-time processing or graph analysis
  4.  Doesn't support multi-tenancy   (means can't process multiple jobs at the same time)
  5.  can't run more than 4000 clusters with better performance 

 

Hadoop V2:  

 

  1.  HDFS  federation 
  2.  Multiple Namenode 
  3. For Map reducing here YARN             
  4. For better  processing control
  5. Support for non-mapreduce type processing
  6. Support for multi-tenancy 

 

For a better  view  check below : 

 

.  HDFS  


 



 

.  YARN   



 




 

1.  Resource Manager:- Manage all the resources for all the Datanode. The resource manager is present in Namenode. 

 

.  Generally one Resource manager per cluster

.  You may have 1 active resource manager and 1  stand by the resource manager


 

2.  Node Manager 

 

.   It launches and monitors resources for containers

 

3. Application Manager 

 

It manages and arranges the task running for a mapreduce job

 

----------------------------------------------------------------------------------------------------------

The Readers of this will need to have some conceptual knowledge of  Big Data and Hadoop Frame work because here I am going to discuss only the Steps for the setup of the Hadoop2 cluster.

 

For the Single node Cluster, I am assuming that we have  2 Different hardware machine where Redhat/centos 7  is installed. 

One Hardware machine for Namenode and one of the machine for Datanode

 

Important:   If  in your lab or  Virtual Machine  you don't have a DNS server then please follow the given steps

 

Note:   

  1. For  Web Portal Communication every node will be able to communicate using IP and Hostname.
  2. You can use the hostname command to setup the hostname of any system  as given below



 

Step 1:  -->  Setup  hostname  accordingly <--   

 

[root@namenode~]# hostnamectl set-hostname  namenode.cluster1.com

 

For Making  it persistant so open /etc/hostname file and write hostname in this file like see below 

 

[root@namenode ~]# cat  /etc/hostname

NETWORKING=yes

HOSTNAME=namenode.cluster1.com

 


 

Step 2:    Make your local DNS on every system this file will look like this.


 

[root@namenode ~]#    cat   /etc/hosts


 

192.168.0.254         namenode.cluster1.com

192.168.0.201         datanode1.cluster1.com



 

Step  3:   Flush firewall and disable  Selinux on each node because we have to use many ports so if we Flush All firewall rule then it’s easy for network setup.

 

[root@server76 ~]# setenforce  0    #Disable SELinux but temporary.

[root@server76 ~]# iptables -F        #Flash all firewall rule 

 

OR

 

[root@server76 ~]# systemctl disable --now firewalld   #  for production you can add firewall ruels

 

 

Step  4:   make sure you have  yum configured  for software installation or you can download from apache on  each  node




 

System 1:   (NameNode)

 

IP Address :     192.168.0.254

Hostname:       namenode.cluster1.com


 

System 2:     (DataNode)

 

IP Address :     192.168.0.200

Hostname:       datanode1.cluster1.com

 



 

 

 

 

Note:   For making a Hadoop cluster  you need to  install  java jdk  1.8 or higher   Hadoop 2  on each  node

 

Important:    i  am assuming  that you have already yum client configured  of you can download software from oracle or apache  website.


 

Hadoop v2  cluster setup :

---------------------------------

 

Required software  : 

 

1.  Download  apache hadoop version  2.7 or later from  given below link

 

https://archive.apache.org/dist/hadoop/core/

 

2.  Download JDK 1.8  from the oracle  website and install using rpm  

 

http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

 

Or direct download the JDK package So run a command on your terminal 

 

wget -c --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.rpm


 

Install JDK usinng RPM

[namenode.cluster1.com ]# rpm -ivh <package name of java>





 

3.  Setting  JDK path  

 

[namenode.cluster1.com ]#  cat  /root/.bashrc

# .bashrc

 

# User specific aliases and functions

 

alias rm='rm -i'

alias cp='cp -i'

alias mv='mv -i'

 

# Source global definitions

if [ -f /etc/bashrc ]; then

. /etc/bashrc

fi


 

JAVA_HOME=/usr/java/jdk1.8.0_121


 

4.  Installing  apache  Hadoop from tar

[root@namenode hadoop]# tar  -xvzf  Hadoop-2.7.3.tar.gz


 

5. Move this to /  for permission and security reason

[root@namenode hadoop]#   mv  hadoop-2.7.3  /hadoop2

 

Important: file must look like this.


 

[root@station53 hadoop]# cat /root/.bashrc

# .bashrc

 

# User-specific aliases and functions

 

alias rm='rm -i'

alias cp='cp -i'

alias mv='mv -i'

 

# Source global definitions

if [ -f /etc/bashrc ]; then

    . /etc/bashrc

fi

 

JAVA_HOME=/usr/java/jdk1.8.0_121

HADOOP_HOME=/hadoop2

PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

export PATH


 

Note:    you will find  files something like given below


 

6. Go into the Hadoop directory 

 

[root@namenode hadoop]# cd /hadoop2/etc/hadoop/

[root@namenode hadoop]# ls

capacity-scheduler.xml      hadoop-policy.xml        kms-log4j.properties        slaves

configuration.xsl           hdfs-site.xml            kms-site.xml                ssl-client.xml.example

container-executor.cfg      httpfs-env.sh            log4j.properties            ssl-server.xml.example

core-site.xml               httpfs-log4j.properties  mapred-env.cmd              yarn-env.cmd

hadoop-env.cmd              httpfs-signature.secret  mapred-env.sh               yarn-env.sh

hadoop-env.sh               httpfs-site.xml          mapred-queues.xml.template  yarn-site.xml

hadoop-metrics2.properties  kms-acls.xml             mapred-site.xml

hadoop-metrics.properties   kms-env.sh               mapred-site.xml.template

 

7. Now  setting  some important file for  HDFS cluster  : 

 

1.   In hadoop-env.sh  setup  jdk  path  then it will look like this

 

[root@namenode hadoop]# cat   hadoop-env.sh 

# The only required environment variable is JAVA_HOME.  All others are

# set JAVA_HOME in this file, so that it is correctly defined on

export JAVA_HOME=/usr/java/jdk1.8.0_121

 

2.   hdfs-site.xml  will look like this 

 

[root@namenode hadoop]# cat hdfs-site.xml 

 

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>dfs.namenode.name.dir</name>

<value>/nnnhhh22</value>

</property>

</configuration>

 

3.   core-site.xml  will look like this, but Make sure your Namenode machine IP address are correct 

 

[root@namenode hadoop]# cat core-site.xml 

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://192.168.10.120:10002</value>

</property>

</configuration>



 

8.  Format namenode and start service  

 

i) Formate namenode 

  

[root@namenode hadoop]# hdfs  namenode -format 

 

ii) Start  the service of namenode  and  datanode 

 

[root@namenode hadoop]# hadoop-daemon.sh  start  namenode 

 

-----------------------------------------------------------------------------------------------------------------

Now  setup datanode  : 


 

Note:    Repeat the same for installing JDK and Hadoop v2

 

9. After the download  Hadoop2 and install java 

 

[root@namenode hadoop]# cd /hadoop2/etc/hadoop/

[root@namenode hadoop]# ls

 

10. now  setup  datanode  and make an entry in 

 

 hdfs-site.xml 

 

[root@datanode1 hadoop]# cat hdfs-site.xml 

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>dfs.datanode.data.dir</name>

<value>/nnnhhh22ddnn</value>

</property>

 

</configuration>

 

 

[root@datanode1 hadoop]#  cat core-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://192.168.10.120:10002</value>

</property>

</configuration>

 

11. Start the datanode service 

 

[root@datanode1 hadoop]# hadoop-daemon.sh  start  datanode

 

----------------------------------------------------------------------

 

Now  time  for  Yarn cluster setup :

 

MR2 support 3 frameworks named:

 

Local: only run locally and only require master service to run

(resourcemanager)

 

Classic: run MR1 framework

Yarn: run on multinode cluster require nodemanager, and require

container service for mapreduce_shuffle at nodemanger side

 

.  Resource manager  & client are the same nodes


 

  1. Mapred-site.xml will look like this

 

[root@namenode hadoop]# cat  mapred-site.xml

 

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>mapreduce.framework.name</name>

        <value>yarn</value>

</property>

</configuration>

 

  1. Yarn-site.xml will look like this


 

[root@namenode hadoop]# cat  yarn-site.xml


 

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>resource_masterip:8025</value>

</property>

 

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>resource_masterip:8030</value>

</property>

 

<property>

<name>yarn.resourcemanager.address</name>

<value>resource_masterip:8032</value>

</property>

 

[root@namenode hadoop]#   yarn-daemon.sh  start resourcemanager





 

Now  for Nodemanager :  (slave)

 

---------------------------------------------


 

[root@namenode hadoop]#  vim yarn-site.xml

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

 

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>resource_masterip:8025</value>

</property>


 

Comments