Importance of Hadoop EcoSystems
Hadoop ecosystem is quite interest to envision. How we could adopt this within the realms of DevOps. Hadoop managed by Apache Foundation is a powerful open source platform written in java. That is capable of process large amount of heterogeneous data sets. It is design to scale up from single server to thousands of machines. Each offer local computation and storage and has become an in demand technical skill. Hadoop is an Apache top level project built and used by a global community of contributors and users.
The following sections provide information on most popular components:
Map Reduce:
Framework for easy to write applications which process big amount of data. In parallel on large clusters of commodity hardware in a reliable, fault tolerant manner. In program, there are two functions which are most common in Map Reduce.
Map Task:
Input to convert it into divide smaller parts and distribute on other worker nodes. All worker nodes solve their own small problem and give answer to the master node.
Reduce Task:
Master node combines all answers come from worker. some form of output which is answer of our big distributed problem in node and forms
HDFS:
HDFS is a distributed file-system that provides high throughput access to data. When data is push to HDFS.
Here are the main components of HDFS
Name Node:
It maintains the name system directories and files. Manages the blocks which are present on the Data Nodes.
Data Nodes:
They are the slaves which are deploy on each machine and provide the actual storage. They are responsible for serve clients wants read and write requests
Secondary Name Node:
It is handled for perform periodic checkpoints. In the event of Name Node failure, you can restart the Name Node use the checkpoint.
Hive:
Hive is part of the Hadoop ecosystem and provides an SQL like interface to Hadoop. The main building blocks of Hive are Metastore: To store the metadata about columns, partition and system catalogue. Driver: To manage the lifecycle of a HiveQL statement Query Compiler: To compiles HiveQL into a directed acyclic graph. Execution Engine: To execute the tasks in proper order which are produce by the compiler. Hive Server: To provide a Thrift interface and a JDBC / ODBC server. HBase: As said earlier, HDFS works on write once and read many times pattern, but this isn’t a case always. HBase built on top of HDFS and distributed on column oriented database.
Here are the main components of HBase:
HBase Master: It is handles for negotiate load balance across. All Region Servers and maintains the state of the cluster. It is not part of the actual data storage or retrieval path.
Hadoop Training in Hyderabad KPHB
Region Server: It is deploy on each machine and hosts data and processes I/O requests.