Steps to Install and Configure HDFS
This documentation is for setting up HDFS (v2.8.1) cluster with one namenode and one datanode
Setup HDFS version 2.8.1
Prerequisites
Create 2 VMs. They’ll be referred to throughout this guide as,
Install java (
java-8-openjdk
) to all the machines in the cluster and setup theJAVA_HOME
environment variable for the same.Get your Java installation path.
Note: Take the value of the current link and remove the trailing /bin/java
.
For example on RHEL 7, the link is /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/jre/bin/java
So JAVA_HOME
should be /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/jre
.
Edit ~/bashrc.sh:
Export
JAVA_HOME={path-tojava}
with your actual java installation path.
For example on a Debian with open-jdk-8: export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/jre
Note: In the further steps when u login to the hadoop account set the java path in ~/hadoop/etc/hadoop/hadoop-env.sh
also.
Get the IP of master and slave nodes using:
Adjust
/etc/hosts
on all nodes according to your configuration.
Note:
While adding same machine ip to /etc/hosts
, use private ip that machine instead of public IP. For other machine in the cluster use public IP. * Edit the Master node VM /etc/hosts
file use private IP of Master node and public IP of the Slave node. * Edit the Slave node VM /etc/hosts
file use private IP of Slave node and Public IP of Master node. * Example: 10.0.22.11 node-master.example.com 10.0.3.12 node-slave1.example.com
Creating Hadoop User
Create a hadoop user in every machine in the cluster to followup the documentation or replace the hadoop user in the documentation with your own user.
Log in to the system as the root user.
Create a hadoop user account using the useradd command.
Set a password for the new hadoop user using the passwd command.
Add the haddop user to the wheel group using the usermod command.
Test that the updated configuration allows the user you created to run commands using sudo.
Use the su to switch to the new user account that you created.
Use the groups to verify that the user is in the wheel group.
Use the sudo command to run the whoami command. As this is the first time you have run a command using sudo from hadoop user account the banner message will be displayed. You will be also be prompted to enter the password for the hadoop account.
The last line of the output is the user name returned by the whoami command. If sudo is configured correctly this value will be root.
You have successfully configured a hadoop user with sudo access. You can now log in to this hadoop account and use sudo to run commands as if you were logged in to the account of the root user.
Distribute Authentication Key-pairs for the Hadoop User
The master node will use an ssh-connection to connect to other nodes with key-pair authentication, to manage the cluster.
Login to node-master as the hadoop user, and generate an ssh-key:
id_rsa.pub
will contains the generated public keyCopy the public key to all the other nodes.
or
Update the
$HOME/.ssh/id_rsa.pub
file contents of slave node to Master node$HOME/.ssh/authorized_keys file
and also Update$HOME/.ssh/id_rsa.pub
file contents of Master node to Slave node$HOME/.ssh/authorized_keys manually
.
Verify ssh from Master node to slave node and vice versa.
Note: if ssh fails, try setting up again the authorized_keys to the machine.
Download and Unpack Hadoop Binaries
Login to node-master as the hadoop user, download the Hadoop tarball from Hadoop project page, and unzip it: cd wget https://archive.apache.org/dist/hadoop/core/hadoop-2.8.1/hadoop-2.8.1.tar.gz tar -xzf hadoop-2.8.1.tar.gz mv hadoop-2.8.1 hadoop
Set Environment variables in each machine in the cluster
Add Hadoop binaries to your PATH. Edit /home/hadoop/.bashrc
or /home/hadoop/.bash_profile
and add the following line: export HADOOP_HOME=$HOME/hadoop export HADOOP_CONF_DIR=$HOME/hadoop/etc/hadoop export HADOOP_MAPRED_HOME=$HOME/hadoop export HADOOP_COMMON_HOME=$HOME/hadoop export HADOOP_HDFS_HOME=$HOME/hadoop export YARN_HOME=$HOME/hadoop export PATH=$PATH:$HOME/hadoop/bin export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-0.el7_5.x86_64/jre
Run following command to apply environment variable changes, using source command: source /home/hadoop/.bashrc or source /home/hadoop/.bash_profile
Configure the Master Node
Configuration will be done on node-master and replicated to other slave nodes.
Set NameNode
Update ~/hadoop/etc/hadoop/core-site.xml
: <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://node-master.example:51000</value> </property> </configuration>
Set path for HDFS
Edit
~/hadoop/etc/hadoop/hdfs-site.xml
:Create directories
Configure Master
Edit ~/hadoop/etc/hadoop/masters
to be: node-master.example.com
Configure Slaves
Edit ~/hadoop/etc/hadoop/slaves
to be: This slaves file will specifies the datanode to be setup in which machine node-slave1.example.com
Create duplicate config files on each node
Copy the hadoop binaries to slave nodes:
or copy each configured files to other nodes
Connect to node1 via ssh. A password isn’t required, thanks to the ssh keys copied above:
Unzip the binaries, rename the directory, and exit node-slave1.example.com to get back on the node-master.example.com:
Copy the Hadoop configuration files to the slave nodes:
Format HDFS
HDFS needs to be formatted like any classical file system. On node-master, run the following command: hdfs namenode -format
Your Hadoop installation is now configured and ready to run.
Start HDFS
Start the HDFS by running the following script from node-master:
start-dfs.sh
,stop-dfs.sh
script files will be present inhadoop_Installation_Dir/sbin/start.dfs.sg
It’ll start NameNode and SecondaryNameNode on node-master.example.com
, and DataNode on node-slave1.example.com
, according to the configuration in the slaves config file.
Check that every process is running with the jps command on each node. You should get on
node-master.example.com
(PID will be different):and on
node-slave1.example.com
:
Hdfs has been Configured Successfully
Note: If datanode and namenode has not started, look into hdfs logs to debug: $HOME/hadoop/logs/
Create HDFS users
To create users for hdfs (regprocessor, prereg, idrepo), run this command:
Note: Configure the user in module specific properties file (ex- pre-registration-qa.properties) as mosip.kernel.fsadapter.hdfs.user-name=prereg**
Create a directory and give permission for each user
Enabling configured port through firewall in each machine in cluster
Note: If different port has been configured , enable those port.
Securing HDFS
Following configuration is required to run HDFS in secure mode. Read more about kerberos here: link
Install Kerberos
Before Installing Kerberos Install the JCE Policy File
Install Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy File on all cluster and Hadoop user machines. Follow this link
Kerberos
Kerberos server(KDC) and the client needs to be installed. Install the client on both master and slave nodes. KDC server will be installed on the master node.
To install packages for a Kerberos server:
To install packages for a Kerberos client:
Configuring the Master KDC Server
Edit the
/etc/krb5.conf
: Configuration snippets may be placed in this directory (includedir /etc/krb5.conf.d/
) as well,
Note: Place this krb5.conf
file in /kernel/kernel-fsadapter-hdfs/src/main/resources
mosip.kernel.fsadapter.hdfs.krb-file=classpath:krb5.conf
Or if kept outside resource then give absolute path mosip.kernel.fsadapter.hdfs.krb-file=file:/opt/kdc/krb5.conf
Edit
/var/kerberos/krb5kdc/kdc.conf
Create the database using the
kdb5_util
utility.Edit the
/var/kerberos/krb5kdc/kadm5.acl
Create the first principal using kadmin.local at the KDC terminal:
Start Kerberos using the following commands:
To set up the KDC server to auto-start on boot. ``` RHEL/CentOS/Oracle Linux 6
Verify that the KDC is issuing tickets. First, run kinit to obtain a ticket and store it in a credential cache file.
Next, use klist to view the list of credentials in the cache.
Use kdestroy to destroy the cache and the credentials it contains.
Create and Deploy the Kerberos Principals and Keytab files
For more information, check here: link
If you have root access to the KDC machine, use kadmin.local, else use kadmin. To start kadmin.local
(on the KDC machine), run this command: sudo kadmin.local
To create the Kerberos principals
Do the following steps for masternode.
In the kadmin.local or kadmin shell, create the hadoop principal. This principal is used for the NameNode, Secondary NameNode, and DataNodes.
Create the HTTP principal.
Create principal for all user of hdfs (regprocessor, prereg, idrepo)
To create the Kerberos keytab files
Create the hdfs keytab file that will contain the hdfs principal and HTTP principal. This keytab file is used for the NameNode, Secondary NameNode, and DataNodes. kadmin: xst -norandkey -k hadoop.keytab hadoop/admin HTTP/admin
Use klist to display the keytab file entries; a correctly-created hdfs keytab file should look something like this: $ klist -k -e -t hadoop.keytab Keytab name: FILE:hadoop.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 1 02/11/2019 08:53:51 hadoop/admin@NODE-MASTER.EXAMPLE.COM (aes256-cts-hmac-sha1-96) 1 02/11/2019 08:53:51 hadoop/admin@NODE-MASTER.EXAMPLE.COM (aes128-cts-hmac-sha1-96) 1 02/11/2019 08:53:51 hadoop/admin@NODE-MASTER.EXAMPLE.COM (des3-cbc-sha1) 1 02/11/2019 08:53:51 hadoop/admin@NODE-MASTER.EXAMPLE.COM (arcfour-hmac) 1 02/11/2019 08:53:51 hadoop/admin@NODE-MASTER.EXAMPLE.COM (camellia256-cts-cmac) 1 02/11/2019 08:53:51 hadoop/admin@NODE-MASTER.EXAMPLE.COM (camellia128-cts-cmac) 1 02/11/2019 08:53:51 hadoop/admin@NODE-MASTER.EXAMPLE.COM (des-hmac-sha1) 1 02/11/2019 08:53:51 hadoop/admin@NODE-MASTER.EXAMPLE.COM (des-cbc-md5) 1 02/11/2019 08:53:51 HTTP/admin@NODE-MASTER.EXAMPLE.COM (aes256-cts-hmac-sha1-96) 1 02/11/2019 08:53:51 HTTP/admin@NODE-MASTER.EXAMPLE.COM (aes128-cts-hmac-sha1-96) 1 02/11/2019 08:53:51 HTTP/admin@NODE-MASTER.EXAMPLE.COM (des3-cbc-sha1) 1 02/11/2019 08:53:51 HTTP/admin@NODE-MASTER.EXAMPLE.COM (arcfour-hmac) 1 02/11/2019 08:53:51 HTTP/admin@NODE-MASTER.EXAMPLE.COM (camellia256-cts-cmac) 1 02/11/2019 08:53:51 HTTP/admin@NODE-MASTER.EXAMPLE.COM (camellia128-cts-cmac) 1 02/11/2019 08:53:51 HTTP/admin@NODE-MASTER.EXAMPLE.COM (des-hmac-sha1) 1 02/11/2019 08:53:51 HTTP/admin@NODE-MASTER.EXAMPLE.COM (des-cbc-md5)
Creating keytab [mosip.keytab]
file for application to authenticate with HDFS cluster
To view the principals in keytab
To deploy the Kerberos keytab file
On every node in the cluster, copy or move the keytab file to a directory that Hadoop can access, such as /home/hadoop/hadoop/etc/hadoop/hadoop.keytab
.
To configure Kernel HDFS Adapter
Place this mosip.keytab file in /kernel/kernel-fsadapter-hdfs/src/main/resources
and update the application properties for mosip.kernel.fsadapter.hdfs.keytab-file=classpath:mosip.keytab mosip.kernel.fsadapter.hdfs.authentication-enabled=true mosip.kernel.fsadapter.hdfs.kdc-domain=NODE-MASTER.EXAMPLE.COM mosip.kernel.fsadapter.hdfs.name-node-url=hdfs://host-ip:port
Note: Configure the user in module specific properties file (example: pre-registration-qa.properties
as mosip.kernel.fsadapter.hdfs.user-name=prereg
).
Enable security in HDFS
To enable security in hdfs, you must stop all Hadoop daemons in your cluster and then change some configuration properties. sh hadoop/sbin/stop-dfs.sh
Enable Hadoop Security
To enable Hadoop security, add the following properties to the
~/hadoop/etc/hadoop/core-site.xml
file on every machine in the cluster:Add the following properties to the
~/hadoop/etc/hadoop/hdfs-site.xml
file on every machine in the cluster.
Configuring HTTPS in HDFS
Generating the key and certificate
The first step of deploying HTTPS is to generate the key and the certificate for each machine in the cluster. You can use Java’s keytool utility to accomplish this task: Ensure that firstname/lastname OR common name (CN) matches exactly with the fully qualified domain name (e.g. node-master.example.com) of the server. keytool -genkey -alias localhost -keyalg RSA -keysize 2048 -keystore keystore.jks
Creating your own CA
We use openssl to generate a new CA certificate: openssl req -new -x509 -keyout ca-key.cer -out ca-cert.cer -days 365
The next step is to add the generated CA to the clients’ truststore so that the clients can trust this CA: keytool -keystore truststore.jks -alias CARoot -import -file ca-cert.cer
Signing the certificate:
The next step is to sign all certificates generated with the CA. First, you need to export the certificate from the keystore: keytool -keystore keystore.jks -alias localhost -certreq -file cert-file.cer
Then sign it with the CA: openssl x509 -req -CA ca-cert.cer -CAkey ca-key.cer -in cert-file.cer -out cert-signed.cer -days 365 -CAcreateserial -passin pass:12345678
Finally, you need to import both the certificate of the CA and the signed certificate into the keystore keytool -keystore keystore.jks -alias CARoot -import -file ca-cert.cer keytool -keystore keystore.jks -alias localhost -import -file cert-signed.cer
Configuring HDFS
Change the ssl-server.xml and ssl-client.xml on all nodes to tell HDFS about the keystore and the truststore
Edit
~/hadoop/etc/hadoop/ssl-server.xml
Edit
~/hadoop/etc/hadoop/ssl-client.xml
After restarting the HDFS daemons (NameNode, DataNode and JournalNode), you should have successfully deployed HTTPS in your HDFS cluster.
For you face error during kerberos, check this: link
Last updated