woman avatar

by Sourabh Purohit October 13, 2020

HDFS basic commands


HDFS:- Hadoop distributed file system

HDFS: The Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop applications. It employs a NameNode and DataNode architecture to implement a distributed file system that provides high-performance access to data across highly scalable Hadoop clusters.

HDFS is a key part of the many Hadoop ecosystem technologies, as it provides a reliable means for managing pools of big data and supporting related big data analytics applications

 

Now  here we are going to discuss some commands used in hdfs

 

1. To list the Current data in the Hadoop cluster  (Here data means files and folders)

 

Case 1 :   From   Namenode  

[ sourabh@probog ]#  hadoop fs -ls  /

Found 2 items

drwxr-xr-x   - root supergroup          0 2016-12-26 05:53 /probog

 

-rw-r--r--   3 root supergroup          0 2016-12-26 05:54 /probog.txt

 

Case  2:  From  Hadoop Client  

 

[ sourabh@probog ]#  hadoop fs -ls  hdfs://172.17.0.2:10001/

 

Found 2 items

drwxr-xr-x   - root supergroup          0 2016-12-26 05:53 /probog

 

-rw-r--r--   3 root supergroup          0 2016-12-26 05:54 /probog.txt


 

Note:     Here  172.17.0.2  is the IP of  Namenode


 

2. To create  a Directory  in HDFS

 

[ sourabh@probog ]# hadoop fs -mkdir  /probog2

 

[ sourabh@probog ]# hadoop fs -ls  /    

Found 3 items

drwxr-xr-x   - root supergroup          0 2016-12-26 05:53 /probog

drwxr-xr-x   - root supergroup          0 2016-12-26 06:02 /probog2

-rw-r--r--   3 root supergroup          0 2016-12-26 05:54 /probog.txt


 

3.   To create a file  

 

[ sourabh@probog ]# hadoop  fs  -touchz   /probog/newdata.txt

[ sourabh@probog ]#

[ sourabh@probog ]#

[ sourabh@probog ]# hadoop fs -ls  /probog

Found 2 items

-rw-r--r--   3 root supergroup    1400015 2016-12-26 05:53 /probog1/abc.txt

-rw-r--r--   3 root supergroup          0 2016-12-26 06:03 /probog/newdata.txt

 

4.  Remove  file from hadoop HDFS 

 

[ sourabh@probog ]# hadoop fs  -rm  /probog.txt

Deleted hdfs://172.17.0.2:10001/probog.txt

 

5.   Remove  directory  from  HDFS  

 

[ sourabh@probog ]# hadoop fs  -rmr  /probog

Deleted hdfs://172.17.0.2:10001/probog


 

6.   Copy files or folder  from  local system  to HDFS  

 

[ sourabh@probog ]#  hadoop fs  -copyFromLocal /etc/passwd    /probog/  

 

OR  

 

[ sourabh@probog ]#  hadoop fs  -put  /etc/passwd    /probog/  

 

Note :  You can put multiple file name by using space to upload  HDFS 

 

7.   Reading  file  from HDFS  


 

[ sourabh@probog ]# hadoop  fs  -cat  /probog2/abc.txt

Hello from PROBOG

 

8.   copy files with HDFS  filesystem  

 

[ sourabh@probog ]# hadoop fs -cp  /probog2/abc.txt  /probog2/few



 

9.   Changing  Replication  value of already existing  data in HDFS cluster

 

[ sourabh@probog ]# hadoop fs -ls  /new/passwd

Found 1 items

-rw-r--r--   1 root supergroup        828 2017-01-08 07:36 /new/passwd

 

Note : Here  replication factor is 1 

 

changing  replication factor  from 1 to  2

 

[ sourabh@probog ]# hadoop  dfs  -setrep 2 hdfs://172.17.0.2:10001/new/passwd

 

Replication 2 set: hdfs://172.17.0.2:10001/new/passwd

 

[ sourabh@probog ]# hadoop fs -ls  hdfs://172.17.0.2:10001/new/passwd

Found 1 items

-rw-r--r--   2 root supergroup        828 2017-01-08 07:36 /new/passwd



 

10 :   Setting  the replication  and  Block size  during  upload  

 

Important :     by Default Apache hadoop cluster  have  replication  factor  3  and  block size of

 

64MB  but if you want to change the default during upload.


 

I)      Replication  factor

 

[ sourabh@probog ]#  hadoop fs -D dfs.replication=1 -put /etc/passwd  /new

 

II)    Block Size  

 

[ sourabh@probog ]#  hadoop fs -D dfs.block.size= 67108864  -put /etc/passwd  /new

 

Block size in Bytes

 

11.   Check number  files and directory  and space  used 

 

[ sourabh@probog ]#  hadoop  fs -count /

 

           2            1                828 hdfs://172.17.0.2:10001/

 

Comments