This documentation is for setting up HDFS (v2.8.1) cluster with one namenode and one datanode
Create 2 VMs. They’ll be referred to throughout this guide as,
Install java (java-8-openjdk
) to all the machines in the cluster and setup the JAVA_HOME
environment variable for the same.
Get your Java installation path.
Note: Take the value of the current link and remove the trailing /bin/java
.
For example on RHEL 7, the link is /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/jre/bin/java
So JAVA_HOME
should be /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/jre
.
Export JAVA_HOME={path-tojava}
with your actual java installation path.
For example on a Debian with open-jdk-8: export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/jre
Note: In the further steps when u login to the hadoop account set the java path in ~/hadoop/etc/hadoop/hadoop-env.sh
also.
Get the IP of master and slave nodes using:
Adjust /etc/hosts
on all nodes according to your configuration.
Note:
While adding same machine ip to /etc/hosts
, use private ip that machine instead of public IP. For other machine in the cluster use public IP. * Edit the Master node VM /etc/hosts
file use private IP of Master node and public IP of the Slave node. * Edit the Slave node VM /etc/hosts
file use private IP of Slave node and Public IP of Master node. * Example: 10.0.22.11 node-master.example.com 10.0.3.12 node-slave1.example.com
Create a hadoop user in every machine in the cluster to followup the documentation or replace the hadoop user in the documentation with your own user.
Log in to the system as the root user.
Create a hadoop user account using the useradd command.
Set a password for the new hadoop user using the passwd command.
Add the haddop user to the wheel group using the usermod command.
Test that the updated configuration allows the user you created to run commands using sudo.
Use the su to switch to the new user account that you created.
Use the groups to verify that the user is in the wheel group.
Use the sudo command to run the whoami command. As this is the first time you have run a command using sudo from hadoop user account the banner message will be displayed. You will be also be prompted to enter the password for the hadoop account.
The last line of the output is the user name returned by the whoami command. If sudo is configured correctly this value will be root.
You have successfully configured a hadoop user with sudo access. You can now log in to this hadoop account and use sudo to run commands as if you were logged in to the account of the root user.
The master node will use an ssh-connection to connect to other nodes with key-pair authentication, to manage the cluster.
Login to node-master as the hadoop user, and generate an ssh-key:
id_rsa.pub
will contains the generated public key
Copy the public key to all the other nodes.
or
Update the $HOME/.ssh/id_rsa.pub
file contents of slave node to Master node $HOME/.ssh/authorized_keys file
and also Update $HOME/.ssh/id_rsa.pub
file contents of Master node to Slave node $HOME/.ssh/authorized_keys manually
.
Note: if ssh fails, try setting up again the authorized_keys to the machine.
Login to node-master as the hadoop user, download the Hadoop tarball from Hadoop project page, and unzip it: cd wget https://archive.apache.org/dist/hadoop/core/hadoop-2.8.1/hadoop-2.8.1.tar.gz tar -xzf hadoop-2.8.1.tar.gz mv hadoop-2.8.1 hadoop
Add Hadoop binaries to your PATH. Edit /home/hadoop/.bashrc
or /home/hadoop/.bash_profile
and add the following line: export HADOOP_HOME=$HOME/hadoop export HADOOP_CONF_DIR=$HOME/hadoop/etc/hadoop export HADOOP_MAPRED_HOME=$HOME/hadoop export HADOOP_COMMON_HOME=$HOME/hadoop export HADOOP_HDFS_HOME=$HOME/hadoop export YARN_HOME=$HOME/hadoop export PATH=$PATH:$HOME/hadoop/bin export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-0.el7_5.x86_64/jre
Run following command to apply environment variable changes, using source command: source /home/hadoop/.bashrc or source /home/hadoop/.bash_profile
Configuration will be done on node-master and replicated to other slave nodes.
Update ~/hadoop/etc/hadoop/core-site.xml
: <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://node-master.example:51000</value> </property> </configuration>
Edit ~/hadoop/etc/hadoop/hdfs-site.xml
:
Create directories
Edit ~/hadoop/etc/hadoop/masters
to be: node-master.example.com
Edit ~/hadoop/etc/hadoop/slaves
to be: This slaves file will specifies the datanode to be setup in which machine node-slave1.example.com
Copy the hadoop binaries to slave nodes:
or copy each configured files to other nodes
Connect to node1 via ssh. A password isn’t required, thanks to the ssh keys copied above:
Unzip the binaries, rename the directory, and exit node-slave1.example.com to get back on the node-master.example.com:
Copy the Hadoop configuration files to the slave nodes:
HDFS needs to be formatted like any classical file system. On node-master, run the following command: hdfs namenode -format
Your Hadoop installation is now configured and ready to run.
Start the HDFS by running the following script from node-master: start-dfs.sh
, stop-dfs.sh
script files will be present in hadoop_Installation_Dir/sbin/start.dfs.sg
It’ll start NameNode and SecondaryNameNode on node-master.example.com
, and DataNode on node-slave1.example.com
, according to the configuration in the slaves config file.
Check that every process is running with the jps command on each node. You should get on node-master.example.com
(PID will be different):
and on node-slave1.example.com
:
Hdfs has been Configured Successfully
Note: If datanode and namenode has not started, look into hdfs logs to debug: $HOME/hadoop/logs/
To create users for hdfs (regprocessor, prereg, idrepo), run this command:
Note: Configure the user in module specific properties file (ex- pre-registration-qa.properties) as mosip.kernel.fsadapter.hdfs.user-name=prereg**
Create a directory and give permission for each user
Note: If different port has been configured , enable those port.
Following configuration is required to run HDFS in secure mode. Read more about kerberos here: link
Install Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy File on all cluster and Hadoop user machines. Follow this link
Kerberos server(KDC) and the client needs to be installed. Install the client on both master and slave nodes. KDC server will be installed on the master node.
To install packages for a Kerberos server:
To install packages for a Kerberos client:
Edit the /etc/krb5.conf
: Configuration snippets may be placed in this directory (includedir /etc/krb5.conf.d/
) as well,
Note: Place this krb5.conf
file in /kernel/kernel-fsadapter-hdfs/src/main/resources
mosip.kernel.fsadapter.hdfs.krb-file=classpath:krb5.conf
Or if kept outside resource then give absolute path mosip.kernel.fsadapter.hdfs.krb-file=file:/opt/kdc/krb5.conf
Edit /var/kerberos/krb5kdc/kdc.conf
Create the database using the kdb5_util
utility.
Edit the /var/kerberos/krb5kdc/kadm5.acl
Create the first principal using kadmin.local at the KDC terminal:
Start Kerberos using the following commands:
To set up the KDC server to auto-start on boot. ``` RHEL/CentOS/Oracle Linux 6
Verify that the KDC is issuing tickets. First, run kinit to obtain a ticket and store it in a credential cache file.
Next, use klist to view the list of credentials in the cache.
Use kdestroy to destroy the cache and the credentials it contains.
For more information, check here: link
If you have root access to the KDC machine, use kadmin.local, else use kadmin. To start kadmin.local
(on the KDC machine), run this command: sudo kadmin.local
Do the following steps for masternode.
In the kadmin.local or kadmin shell, create the hadoop principal. This principal is used for the NameNode, Secondary NameNode, and DataNodes.
Create the HTTP principal.
Create principal for all user of hdfs (regprocessor, prereg, idrepo)
Create the hdfs keytab file that will contain the hdfs principal and HTTP principal. This keytab file is used for the NameNode, Secondary NameNode, and DataNodes. kadmin: xst -norandkey -k hadoop.keytab hadoop/admin HTTP/admin
Use klist to display the keytab file entries; a correctly-created hdfs keytab file should look something like this: $ klist -k -e -t hadoop.keytab Keytab name: FILE:hadoop.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 1 02/11/2019 08:53:51 hadoop/admin@NODE-MASTER.EXAMPLE.COM (aes256-cts-hmac-sha1-96) 1 02/11/2019 08:53:51 hadoop/admin@NODE-MASTER.EXAMPLE.COM (aes128-cts-hmac-sha1-96) 1 02/11/2019 08:53:51 hadoop/admin@NODE-MASTER.EXAMPLE.COM (des3-cbc-sha1) 1 02/11/2019 08:53:51 hadoop/admin@NODE-MASTER.EXAMPLE.COM (arcfour-hmac) 1 02/11/2019 08:53:51 hadoop/admin@NODE-MASTER.EXAMPLE.COM (camellia256-cts-cmac) 1 02/11/2019 08:53:51 hadoop/admin@NODE-MASTER.EXAMPLE.COM (camellia128-cts-cmac) 1 02/11/2019 08:53:51 hadoop/admin@NODE-MASTER.EXAMPLE.COM (des-hmac-sha1) 1 02/11/2019 08:53:51 hadoop/admin@NODE-MASTER.EXAMPLE.COM (des-cbc-md5) 1 02/11/2019 08:53:51 HTTP/admin@NODE-MASTER.EXAMPLE.COM (aes256-cts-hmac-sha1-96) 1 02/11/2019 08:53:51 HTTP/admin@NODE-MASTER.EXAMPLE.COM (aes128-cts-hmac-sha1-96) 1 02/11/2019 08:53:51 HTTP/admin@NODE-MASTER.EXAMPLE.COM (des3-cbc-sha1) 1 02/11/2019 08:53:51 HTTP/admin@NODE-MASTER.EXAMPLE.COM (arcfour-hmac) 1 02/11/2019 08:53:51 HTTP/admin@NODE-MASTER.EXAMPLE.COM (camellia256-cts-cmac) 1 02/11/2019 08:53:51 HTTP/admin@NODE-MASTER.EXAMPLE.COM (camellia128-cts-cmac) 1 02/11/2019 08:53:51 HTTP/admin@NODE-MASTER.EXAMPLE.COM (des-hmac-sha1) 1 02/11/2019 08:53:51 HTTP/admin@NODE-MASTER.EXAMPLE.COM (des-cbc-md5)
Creating keytab [mosip.keytab]
file for application to authenticate with HDFS cluster
To view the principals in keytab
On every node in the cluster, copy or move the keytab file to a directory that Hadoop can access, such as /home/hadoop/hadoop/etc/hadoop/hadoop.keytab
.
Place this mosip.keytab file in /kernel/kernel-fsadapter-hdfs/src/main/resources
and update the application properties for mosip.kernel.fsadapter.hdfs.keytab-file=classpath:mosip.keytab mosip.kernel.fsadapter.hdfs.authentication-enabled=true mosip.kernel.fsadapter.hdfs.kdc-domain=NODE-MASTER.EXAMPLE.COM mosip.kernel.fsadapter.hdfs.name-node-url=hdfs://host-ip:port
Note: Configure the user in module specific properties file (example: pre-registration-qa.properties
as mosip.kernel.fsadapter.hdfs.user-name=prereg
).
To enable security in hdfs, you must stop all Hadoop daemons in your cluster and then change some configuration properties. sh hadoop/sbin/stop-dfs.sh
To enable Hadoop security, add the following properties to the ~/hadoop/etc/hadoop/core-site.xml
file on every machine in the cluster:
Add the following properties to the ~/hadoop/etc/hadoop/hdfs-site.xml
file on every machine in the cluster.
The first step of deploying HTTPS is to generate the key and the certificate for each machine in the cluster. You can use Java’s keytool utility to accomplish this task: Ensure that firstname/lastname OR common name (CN) matches exactly with the fully qualified domain name (e.g. node-master.example.com) of the server. keytool -genkey -alias localhost -keyalg RSA -keysize 2048 -keystore keystore.jks
We use openssl to generate a new CA certificate: openssl req -new -x509 -keyout ca-key.cer -out ca-cert.cer -days 365
The next step is to add the generated CA to the clients’ truststore so that the clients can trust this CA: keytool -keystore truststore.jks -alias CARoot -import -file ca-cert.cer
The next step is to sign all certificates generated with the CA. First, you need to export the certificate from the keystore: keytool -keystore keystore.jks -alias localhost -certreq -file cert-file.cer
Then sign it with the CA: openssl x509 -req -CA ca-cert.cer -CAkey ca-key.cer -in cert-file.cer -out cert-signed.cer -days 365 -CAcreateserial -passin pass:12345678
Finally, you need to import both the certificate of the CA and the signed certificate into the keystore keytool -keystore keystore.jks -alias CARoot -import -file ca-cert.cer keytool -keystore keystore.jks -alias localhost -import -file cert-signed.cer
Change the ssl-server.xml and ssl-client.xml on all nodes to tell HDFS about the keystore and the truststore
Edit ~/hadoop/etc/hadoop/ssl-server.xml
Edit ~/hadoop/etc/hadoop/ssl-client.xml
After restarting the HDFS daemons (NameNode, DataNode and JournalNode), you should have successfully deployed HTTPS in your HDFS cluster.
For you face error during kerberos, check this: link