Steps to Install and Configure HDFS

This documentation is for setting up HDFS (v2.8.1) cluster with one namenode and one datanode

Setup HDFS version 2.8.1

Prerequisites

  • Create 2 VMs. They’ll be referred to throughout this guide as,

    node-master.example.com
    node-slave1.example.com
  • Install java (java-8-openjdk) to all the machines in the cluster and setup the JAVA_HOME environment variable for the same.

    sudo yum install java-1.8.0-openjdk-devel
  • Get your Java installation path.

    update-alternatives --display java

Note: Take the value of the current link and remove the trailing /bin/java.

For example on RHEL 7, the link is /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/jre/bin/java

So JAVA_HOME should be /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/jre.

Edit ~/bashrc.sh:

  • Export JAVA_HOME={path-tojava} with your actual java installation path.

For example on a Debian with open-jdk-8: export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/jre

Note: In the further steps when u login to the hadoop account set the java path in ~/hadoop/etc/hadoop/hadoop-env.sh also.

  • Get the IP of master and slave nodes using:

  • Adjust /etc/hosts on all nodes according to your configuration.

Note:

While adding same machine ip to /etc/hosts, use private ip that machine instead of public IP. For other machine in the cluster use public IP. * Edit the Master node VM /etc/hosts file use private IP of Master node and public IP of the Slave node. * Edit the Slave node VM /etc/hosts file use private IP of Slave node and Public IP of Master node. * Example: 10.0.22.11 node-master.example.com 10.0.3.12 node-slave1.example.com

Creating Hadoop User

Create a hadoop user in every machine in the cluster to followup the documentation or replace the hadoop user in the documentation with your own user.

  • Log in to the system as the root user.

  • Create a hadoop user account using the useradd command.

  • Set a password for the new hadoop user using the passwd command.

  • Add the haddop user to the wheel group using the usermod command.

  • Test that the updated configuration allows the user you created to run commands using sudo.

    • Use the su to switch to the new user account that you created.

    • Use the groups to verify that the user is in the wheel group.

    • Use the sudo command to run the whoami command. As this is the first time you have run a command using sudo from hadoop user account the banner message will be displayed. You will be also be prompted to enter the password for the hadoop account.

    • The last line of the output is the user name returned by the whoami command. If sudo is configured correctly this value will be root.

You have successfully configured a hadoop user with sudo access. You can now log in to this hadoop account and use sudo to run commands as if you were logged in to the account of the root user.

Distribute Authentication Key-pairs for the Hadoop User

The master node will use an ssh-connection to connect to other nodes with key-pair authentication, to manage the cluster.

  • Login to node-master as the hadoop user, and generate an ssh-key:

    id_rsa.pub will contains the generated public key

  • Copy the public key to all the other nodes.

    or

    Update the $HOME/.ssh/id_rsa.pub file contents of slave node to Master node $HOME/.ssh/authorized_keys file and also Update $HOME/.ssh/id_rsa.pub file contents of Master node to Slave node $HOME/.ssh/authorized_keys manually.

Verify ssh from Master node to slave node and vice versa.

Note: if ssh fails, try setting up again the authorized_keys to the machine.

Download and Unpack Hadoop Binaries

Login to node-master as the hadoop user, download the Hadoop tarball from Hadoop project page, and unzip it: cd wget https://archive.apache.org/dist/hadoop/core/hadoop-2.8.1/hadoop-2.8.1.tar.gz tar -xzf hadoop-2.8.1.tar.gz mv hadoop-2.8.1 hadoop

Set Environment variables in each machine in the cluster

Add Hadoop binaries to your PATH. Edit /home/hadoop/.bashrc or /home/hadoop/.bash_profile and add the following line: export HADOOP_HOME=$HOME/hadoop export HADOOP_CONF_DIR=$HOME/hadoop/etc/hadoop export HADOOP_MAPRED_HOME=$HOME/hadoop export HADOOP_COMMON_HOME=$HOME/hadoop export HADOOP_HDFS_HOME=$HOME/hadoop export YARN_HOME=$HOME/hadoop export PATH=$PATH:$HOME/hadoop/bin export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-0.el7_5.x86_64/jre Run following command to apply environment variable changes, using source command: source /home/hadoop/.bashrc or source /home/hadoop/.bash_profile

Configure the Master Node

Configuration will be done on node-master and replicated to other slave nodes.

Set NameNode

Update ~/hadoop/etc/hadoop/core-site.xml: <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://node-master.example:51000</value> </property> </configuration>

Set path for HDFS

  • Edit ~/hadoop/etc/hadoop/hdfs-site.xml:

  • Create directories

Configure Master

Edit ~/hadoop/etc/hadoop/masters to be: node-master.example.com

Configure Slaves

Edit ~/hadoop/etc/hadoop/slaves to be: This slaves file will specifies the datanode to be setup in which machine node-slave1.example.com

Create duplicate config files on each node

  • Copy the hadoop binaries to slave nodes:

    or copy each configured files to other nodes

  • Connect to node1 via ssh. A password isn’t required, thanks to the ssh keys copied above:

  • Unzip the binaries, rename the directory, and exit node-slave1.example.com to get back on the node-master.example.com:

  • Copy the Hadoop configuration files to the slave nodes:

Format HDFS

HDFS needs to be formatted like any classical file system. On node-master, run the following command: hdfs namenode -format Your Hadoop installation is now configured and ready to run.

Start HDFS

  • Start the HDFS by running the following script from node-master: start-dfs.sh, stop-dfs.sh script files will be present in hadoop_Installation_Dir/sbin/start.dfs.sg

It’ll start NameNode and SecondaryNameNode on node-master.example.com, and DataNode on node-slave1.example.com, according to the configuration in the slaves config file.

  • Check that every process is running with the jps command on each node. You should get on node-master.example.com (PID will be different):

    and on node-slave1.example.com:

Hdfs has been Configured Successfully

Note: If datanode and namenode has not started, look into hdfs logs to debug: $HOME/hadoop/logs/

Create HDFS users

  • To create users for hdfs (regprocessor, prereg, idrepo), run this command:

Note: Configure the user in module specific properties file (ex- pre-registration-qa.properties) as mosip.kernel.fsadapter.hdfs.user-name=prereg**

  • Create a directory and give permission for each user

Enabling configured port through firewall in each machine in cluster

Note: If different port has been configured , enable those port.

Securing HDFS

Following configuration is required to run HDFS in secure mode. Read more about kerberos here: link

Install Kerberos

Before Installing Kerberos Install the JCE Policy File

Install Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy File on all cluster and Hadoop user machines. Follow this link

Kerberos

Kerberos server(KDC) and the client needs to be installed. Install the client on both master and slave nodes. KDC server will be installed on the master node.

  • To install packages for a Kerberos server:

  • To install packages for a Kerberos client:

Configuring the Master KDC Server

  • Edit the /etc/krb5.conf: Configuration snippets may be placed in this directory (includedir /etc/krb5.conf.d/) as well,

Note: Place this krb5.conf file in /kernel/kernel-fsadapter-hdfs/src/main/resources mosip.kernel.fsadapter.hdfs.krb-file=classpath:krb5.conf Or if kept outside resource then give absolute path mosip.kernel.fsadapter.hdfs.krb-file=file:/opt/kdc/krb5.conf

  • Edit /var/kerberos/krb5kdc/kdc.conf

  • Create the database using the kdb5_util utility.

  • Edit the /var/kerberos/krb5kdc/kadm5.acl

  • Create the first principal using kadmin.local at the KDC terminal:

  • Start Kerberos using the following commands:

To set up the KDC server to auto-start on boot. ``` RHEL/CentOS/Oracle Linux 6

  • Verify that the KDC is issuing tickets. First, run kinit to obtain a ticket and store it in a credential cache file.

    Next, use klist to view the list of credentials in the cache.

    Use kdestroy to destroy the cache and the credentials it contains.

Create and Deploy the Kerberos Principals and Keytab files

For more information, check here: link

If you have root access to the KDC machine, use kadmin.local, else use kadmin. To start kadmin.local (on the KDC machine), run this command: sudo kadmin.local

To create the Kerberos principals

Do the following steps for masternode.

  • In the kadmin.local or kadmin shell, create the hadoop principal. This principal is used for the NameNode, Secondary NameNode, and DataNodes.

  • Create the HTTP principal.

  • Create principal for all user of hdfs (regprocessor, prereg, idrepo)

To create the Kerberos keytab files

Create the hdfs keytab file that will contain the hdfs principal and HTTP principal. This keytab file is used for the NameNode, Secondary NameNode, and DataNodes. kadmin: xst -norandkey -k hadoop.keytab hadoop/admin HTTP/admin Use klist to display the keytab file entries; a correctly-created hdfs keytab file should look something like this: $ klist -k -e -t hadoop.keytab Keytab name: FILE:hadoop.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 1 02/11/2019 08:53:51 hadoop/[email protected] (aes256-cts-hmac-sha1-96) 1 02/11/2019 08:53:51 hadoop/[email protected] (aes128-cts-hmac-sha1-96) 1 02/11/2019 08:53:51 hadoop/[email protected] (des3-cbc-sha1) 1 02/11/2019 08:53:51 hadoop/[email protected] (arcfour-hmac) 1 02/11/2019 08:53:51 hadoop/[email protected] (camellia256-cts-cmac) 1 02/11/2019 08:53:51 hadoop/[email protected] (camellia128-cts-cmac) 1 02/11/2019 08:53:51 hadoop/[email protected] (des-hmac-sha1) 1 02/11/2019 08:53:51 hadoop/[email protected] (des-cbc-md5) 1 02/11/2019 08:53:51 HTTP/[email protected] (aes256-cts-hmac-sha1-96) 1 02/11/2019 08:53:51 HTTP/[email protected] (aes128-cts-hmac-sha1-96) 1 02/11/2019 08:53:51 HTTP/[email protected] (des3-cbc-sha1) 1 02/11/2019 08:53:51 HTTP/[email protected] (arcfour-hmac) 1 02/11/2019 08:53:51 HTTP/[email protected] (camellia256-cts-cmac) 1 02/11/2019 08:53:51 HTTP/[email protected] (camellia128-cts-cmac) 1 02/11/2019 08:53:51 HTTP/[email protected] (des-hmac-sha1) 1 02/11/2019 08:53:51 HTTP/[email protected] (des-cbc-md5)

Creating keytab [mosip.keytab] file for application to authenticate with HDFS cluster

To view the principals in keytab

To deploy the Kerberos keytab file

On every node in the cluster, copy or move the keytab file to a directory that Hadoop can access, such as /home/hadoop/hadoop/etc/hadoop/hadoop.keytab.

To configure Kernel HDFS Adapter

Place this mosip.keytab file in /kernel/kernel-fsadapter-hdfs/src/main/resources and update the application properties for mosip.kernel.fsadapter.hdfs.keytab-file=classpath:mosip.keytab mosip.kernel.fsadapter.hdfs.authentication-enabled=true mosip.kernel.fsadapter.hdfs.kdc-domain=NODE-MASTER.EXAMPLE.COM mosip.kernel.fsadapter.hdfs.name-node-url=hdfs://host-ip:port

Note: Configure the user in module specific properties file (example: pre-registration-qa.properties as mosip.kernel.fsadapter.hdfs.user-name=prereg).

Enable security in HDFS

To enable security in hdfs, you must stop all Hadoop daemons in your cluster and then change some configuration properties. sh hadoop/sbin/stop-dfs.sh

Enable Hadoop Security

  • To enable Hadoop security, add the following properties to the ~/hadoop/etc/hadoop/core-site.xml file on every machine in the cluster:

  • Add the following properties to the ~/hadoop/etc/hadoop/hdfs-site.xml file on every machine in the cluster.

Configuring HTTPS in HDFS

Generating the key and certificate

The first step of deploying HTTPS is to generate the key and the certificate for each machine in the cluster. You can use Java’s keytool utility to accomplish this task: Ensure that firstname/lastname OR common name (CN) matches exactly with the fully qualified domain name (e.g. node-master.example.com) of the server. keytool -genkey -alias localhost -keyalg RSA -keysize 2048 -keystore keystore.jks

Creating your own CA

We use openssl to generate a new CA certificate: openssl req -new -x509 -keyout ca-key.cer -out ca-cert.cer -days 365 The next step is to add the generated CA to the clients’ truststore so that the clients can trust this CA: keytool -keystore truststore.jks -alias CARoot -import -file ca-cert.cer

Signing the certificate:

The next step is to sign all certificates generated with the CA. First, you need to export the certificate from the keystore: keytool -keystore keystore.jks -alias localhost -certreq -file cert-file.cer Then sign it with the CA: openssl x509 -req -CA ca-cert.cer -CAkey ca-key.cer -in cert-file.cer -out cert-signed.cer -days 365 -CAcreateserial -passin pass:12345678 Finally, you need to import both the certificate of the CA and the signed certificate into the keystore keytool -keystore keystore.jks -alias CARoot -import -file ca-cert.cer keytool -keystore keystore.jks -alias localhost -import -file cert-signed.cer

Configuring HDFS

Change the ssl-server.xml and ssl-client.xml on all nodes to tell HDFS about the keystore and the truststore

  • Edit ~/hadoop/etc/hadoop/ssl-server.xml

  • Edit ~/hadoop/etc/hadoop/ssl-client.xml

After restarting the HDFS daemons (NameNode, DataNode and JournalNode), you should have successfully deployed HTTPS in your HDFS cluster.

For you face error during kerberos, check this: link

Last updated

Was this helpful?